XML -> MySQL

Soldato
Joined
7 Apr 2004
Posts
4,212
Hi,

I have a large XML file that conforms to a well defined schema that I need to import in a mysql database.

The database is fairly complex, consisting of several tables with various keys and what not and additionally I will need some logic during the import process. For example I will need to compare the dates on an XML node and say if there is already a database entry matching this node, then only insert if the date element is newer than the one in the database. So I will have to query the database before each insert to obtain keys/IDs etc.

My current idea is to write some Python to traverse the XML document and basically build up objects for everything and then query the database with the required logic and do the SQL inserts/updates.

Guess I just want to check if this is the best way to do the import, where some logic checks / key calculations etc are required during the import process.

Maybe there's a better method? I know MySQL has some kind of XML import functionality, but I don't think it's applicable here as I won't be able to just blindly import everything in the XML document.
 
Wasn't arguing in favour of vb.net/c# just that I have experience with parsing XML in these languages and the conversion from XML to dataset is phenomenally easy, as would be putting such a dataset into an SQL database.

You could probably do that in about 10-20 lines of code.

I have no idea how easy it is in Python, but the idea of writing code to parse the xml and then build objects around that seems a little over the top when it is already available in an existing frameworks (and isn't that the whole point of frameworks?). Having said that, if you feel you could do it quickly in Python, that's fine - but if that was the case I thought the OP wouldn't really be asking the question.

What do these frameworks provide out of interest? I assume it boils down to a nice way of querying and iterating over the XML. I just seem to be writing a LOT of code to pull the XML out and populating a data structure / object to work with. I'm basically using XPath queries which are nice in Python but to pull out a single node takes about 4-5 lines akin to (and I have maybe 100 nodes I need to extract):

Code:
result = xpath.eval("nodeNameX")
if len(result) != 0:
    mydataobj.nodeNameX = result.content...
else:
    # display node specific warning/error

............

sql = mydataobj.makeSQL()
# simple one line database insert here

So the idea being that the data object will have all relevant XML data as member variables and can apply logic and make a nice SQL insert statement, it seems messy and verbose coding though.

And I will end up with a few hundred lines of code like that because each node I extract has different conditions e.g it may have children I need to iterate over or it may have attributes I need to pull out. So ultimately I need a few lines of code for each XML node and to populate an object.

The SQL side of things should be quite simple as I can just add a function to generate SQL insert code for each object that i've populated. The bit that seems verbose is the pulling everything out of XML and getting it into a usable format for the logic to be applied. Maybe there is no clean approach of doing this though.

The MSSQL approach isn't an option sadly, also I will need to automate it this on a regular basis so a script is a better approach in the long run, all be it much more painful :(
 
Last edited:
Back
Top Bottom