XML -> MySQL

Soldato
Joined
7 Apr 2004
Posts
4,212
Hi,

I have a large XML file that conforms to a well defined schema that I need to import in a mysql database.

The database is fairly complex, consisting of several tables with various keys and what not and additionally I will need some logic during the import process. For example I will need to compare the dates on an XML node and say if there is already a database entry matching this node, then only insert if the date element is newer than the one in the database. So I will have to query the database before each insert to obtain keys/IDs etc.

My current idea is to write some Python to traverse the XML document and basically build up objects for everything and then query the database with the required logic and do the SQL inserts/updates.

Guess I just want to check if this is the best way to do the import, where some logic checks / key calculations etc are required during the import process.

Maybe there's a better method? I know MySQL has some kind of XML import functionality, but I don't think it's applicable here as I won't be able to just blindly import everything in the XML document.
 
If you can get your hands on SSIS, I think it can use an XML source adaptor to import the data and then you would be able to connect to the MySQL DB and populate the tables.
 
basically, your idea is fine.

We can all argue about what language to use but if you are happy with python then use that.


If you can get your hands on SSIS, I think it can use an XML source adaptor to import the data and then you would be able to connect to the MySQL DB and populate the tables.

Isn't SSIS part of MSSQL? I don't do anything with microsoft stuff but surely he would need a licenced copy of MSSQL to use that? (genuine question)
 
Last edited:
Isn't SSIS part of MSSQL? I don't do anything with microsoft stuff but surely he would need a licenced copy of MSSQL to use that? (genuine question)

It is part of MSSQL, but it's a great tool for all sorts of data conversion and can have all sorts of source and destination options. If there was a MSSQL server handy then it could be a good option.
 
We can all argue about what language to use but if you are happy with python then use that.

Wasn't arguing in favour of vb.net/c# just that I have experience with parsing XML in these languages and the conversion from XML to dataset is phenomenally easy, as would be putting such a dataset into an SQL database.

You could probably do that in about 10-20 lines of code.

I have no idea how easy it is in Python, but the idea of writing code to parse the xml and then build objects around that seems a little over the top when it is already available in an existing frameworks (and isn't that the whole point of frameworks?). Having said that, if you feel you could do it quickly in Python, that's fine - but if that was the case I thought the OP wouldn't really be asking the question.
 
Wasn't arguing in favour of vb.net/c# just that I have experience with parsing XML in these languages and the conversion from XML to dataset is phenomenally easy, as would be putting such a dataset into an SQL database.

You could probably do that in about 10-20 lines of code.

I have no idea how easy it is in Python, but the idea of writing code to parse the xml and then build objects around that seems a little over the top when it is already available in an existing frameworks (and isn't that the whole point of frameworks?). Having said that, if you feel you could do it quickly in Python, that's fine - but if that was the case I thought the OP wouldn't really be asking the question.

What do these frameworks provide out of interest? I assume it boils down to a nice way of querying and iterating over the XML. I just seem to be writing a LOT of code to pull the XML out and populating a data structure / object to work with. I'm basically using XPath queries which are nice in Python but to pull out a single node takes about 4-5 lines akin to (and I have maybe 100 nodes I need to extract):

Code:
result = xpath.eval("nodeNameX")
if len(result) != 0:
    mydataobj.nodeNameX = result.content...
else:
    # display node specific warning/error

............

sql = mydataobj.makeSQL()
# simple one line database insert here

So the idea being that the data object will have all relevant XML data as member variables and can apply logic and make a nice SQL insert statement, it seems messy and verbose coding though.

And I will end up with a few hundred lines of code like that because each node I extract has different conditions e.g it may have children I need to iterate over or it may have attributes I need to pull out. So ultimately I need a few lines of code for each XML node and to populate an object.

The SQL side of things should be quite simple as I can just add a function to generate SQL insert code for each object that i've populated. The bit that seems verbose is the pulling everything out of XML and getting it into a usable format for the logic to be applied. Maybe there is no clean approach of doing this though.

The MSSQL approach isn't an option sadly, also I will need to automate it this on a regular basis so a script is a better approach in the long run, all be it much more painful :(
 
Last edited:
Well, example, taken from the Microsoft site -

Code:
    Dim myXMLfile As String = "C:\MySchema.xml"
    Dim ds As New DataSet()
    ' Create new FileStream with which to read the schema.
    Dim fsReadXml As New System.IO.FileStream _
        myXMLfile, System.IO.FileMode.Open)
    Try
        ds.ReadXml(fsReadXml)
    Catch ex As Exception
        MessageBox.Show(ex.ToString())
    Finally
        fsReadXml.Close()
    End Try
End Sub

That will read XML from a file into a dataset..

Your dataset would then contain relationships, tables etc and tables contain columns etc, as with any bog standard .NET dataset.

What I was saying about LINQ is a little different - you can take the XML in raw format and run queries on it as you would on a database (albeit with slightly different syntax). The joy of LINQ is that you can query just about anything.. objects, arrays, XML, REST service... so where as previously you would be forced to parse your XML into some data structure in order that you could iterate through and check various conditions, or place your XML into a database and query there, now you can query the XML directly. (OK - you could use XPath queries but they are horrid!)

As I said - I'm not being some sort of evangelist for .NET here - at the end of the day you work with the language/framework you know best and the language that will get your job done in the quickest time (particularly if you are up against time and/or budget constraints). However, having said that, if you are dealing with this sort of thing on a regular basis then, IMO, LINQ is definitely one of the things you should be looking at.


Have a look at this
- gives a good basic overview of querying XML with LINQ.
 
Back
Top Bottom