Validating XML against DTD

Associate
Joined
23 Feb 2004
Posts
508
Location
London
I have a number (over 250) of XML files which I need to check against a DTD file to make sure they comply.

What's the best way of doing this programmatically, preferably in Java?
 
You could have a look at the SAX parser which contains a feature to validate against DTDs. Use the setValidating method on the SAXParserFactory to enable validating.

Or if you're using J2SE 5.0 have a look at the classes in javax.xml.validation. These allow an object that implements the Source interface to be validated against a Schema. I used this in an application and it seemed to work well.

Hope that helps.
 
Code:
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = parser.parse(new File("pathToXML.xml"));
	
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
	
Source schemaFile = new StreamSource(new File("pathToDTD.dtd"));
Schema schema = factory.newSchema(schemaFile);
	
Validator validator = schema.newValidator();

validator.validate(new DOMSource(document));

I'm trying to use the code above but i keep on getting this error from factory.newSchema(schemaFile); "The markup in the document preceding the root element must be well-formed."

I think it's because it's a DTD file and not an XSD file.

The only thing I can see to change is the W3C_XML_SCHEMA_NS_URI but apparently the only other option I have is RELAXNG_NS_URI, which isn't right.

Any help would be nice. Thanks.
 
Looking at the docs for the schemaFactory it does seem that Schema is the only validation supported out-of-the-box. I guess you could try to use the isSchemaLanguageSupported method on schemaFactory to see if DTD is supported. But I guess not, my mistake.

The SAX parser might be worth a look. From what I remember you have to set the parser to validate, using the setValidate method on the SAXParserFactory class. You then set up a SAX Handler to trap the required events. Finally, the document is parsed (using the parse method I think) and the methods on the handler class are fired when events occur. At least that's the way I remember it.

The following link might also be some help:

http://www.xml.com/lpt/a/2005/07/06/jaxp.html

Describes various parsers with validation methods etc.
 
Back
Top Bottom