Populating a database with data - performance question

Soldato
Joined
1 Mar 2003
Posts
5,508
Location
Cotham, Bristol
Ok I have about 400 csv type files (about 50MB in total, also they're remote i.e. reading each one from a URL), I read all of these files into memory, populate a couple of MySQL temporary tables from this data, then populate 8 permanent table from these temporary tables.

Across all 8 tables there are roughly 192k rows.

All of this takes my Python code about 9.5 minutes to do when the database is empty, and 7 minutes when it's just doing updates. I realise this is quite a high level of detail, but does that sound like reasonable performance?
 
Last edited:
Soldato
OP
Joined
1 Mar 2003
Posts
5,508
Location
Cotham, Bristol
Thanks all, need no, want maybe :p

I have been working on it a while (I've cut 10 minutes off) and I'm just trying to trim down the time as much as I can. I already have timings in there, most of the time is inevitably spent on insert/select statements on the large tables. The following for example takes 100 seconds or so to insert 120k rows into an empty table

Code:
          sqlString = """INSERT INTO CHANNELPROGRAMME (
                             CHANNELID, PROGRAMMEID, EPISODEID,
                             RPEAT, NEWSERIES, PREMIERE, CHOICE, SUBTITLES,
                             DEAFSIGNED, STARRATING, PROGDATE, STARTTIME, ENDTIME,
                             DURATION)
                         SELECT
                             T.CHANNELID, P.PROGRAMMEID, E.EPISODEID, T.RPEAT, T.NEWSERIES,
                             T.PREMIERE, T.CHOICE, T.SUBTITLES, T.DEAFSIGNED, T.STARRATING,
                             T.PROGDATE, T.STARTTIME, T.ENDTIME, T.DURATION
                         FROM
                             TVTEMPTABLE T
                             INNER JOIN PROGRAMME P ON P.PROGTITLE=T.PROGTITLE
                             INNER JOIN CHANNELS CH ON CH.CHANNELID=T.CHANNELID
                             INNER JOIN EPISODE E ON E.SUBTITLE=T.SUBTITLE AND E.EPISODE=T.EPISODE"""

And I have already indexed the columns, so I'm not sure there's any more 'fat' to trim.
 
Soldato
OP
Joined
1 Mar 2003
Posts
5,508
Location
Cotham, Bristol
Thanks all, I haven't done anything about the indexes in the end as I moved the script to my always on work PC which is significantly newer than my laptop. It now runs in about 3 minutes, so I'm happy to leave it as is :)
 
Back
Top Bottom