More efficient way of doing this?

Soldato
Joined
1 Mar 2003
Posts
5,508
Location
Cotham, Bristol
I'm writing a python script which takes csv data from radiotimes (updated daily) and inserts it into a database.

Every channel has it's own csv file, for example 92.dat is the data for BBC 1. There are 386 channels.

Now I've designed my database schema like so

Code:
CHANNELS
-----------------
CHANNELID | CHANNELNAME


GENRE
-----------------
GENREID | GENRENAME


EPISODE
-----------------
EPISODEID | PROGRAMMEID | SUBTITLE | EPISODE | DESCRIPTION


PROGRAMME
-----------------
PROGRAMMEID | GENREID | PROGTITLE | YEAR | DIRECTOR | PERFORMERS | FILM | WIDESCREEN | BNW | STARRATING | CERTIFICATE | DESCRIPTION


CHANNELPROGRAMMES
-----------------
CHANNELPROGRAMMEID | CHANNELID | PROGRAMMEID | EPISODEID | REPEAT | NEWSERIES | PREMIERE | CHOICE | SUBTITLES | DEAFSIGNED | DATE | STARTTIME | ENDTIME | DURATION

I've written the python code, it is extremely slow when it comes to inserting all of the program entries into the channel programmes table. I'm not sure if there's any way to make it quicker.

The basic process I'm following is

Code:
1. Tidy up previous days data
2. Download channels lookup table (csv data in the format 92|BBC 1)
3. Initialise list of channel files into progFiles dictionary (eg. 92.dat is the data for BBC1)
   ---Update channels table
      ---For Loop through channel list, insert channels if new, update channels if exists but changed
4. For loop through list of channel files (386 .dat files)
   ---Update database
      ---Open current csv dat file and load each line into a python dictionary (csv.DictReader)
      ---Append each entry to a new channelPrograms list
      ---For every line (record) in the channelPrograms list
         ---Update Genres table with new genres if necessary (each genre is unique)
         ---Update Programmes table for every new Programme (each programme is unique i.e. Lost)
         ---Update Episode table for every new episode of a programme (each episode is unique i.e. Lost episode 1, series 1)
         ---Delete entries from channel programmes table where date < today
         ---Insert record into channel programmes table (no unique records, i.e. a programme can be repeated)
 
MySQL, tbh I think the first load to database might take quite some time, whereas the subsequent daily updates might not as I plan to not insert new lines if that programme already exists. It will however still have to read every line of data to check if that's the case, so it's still a long process.
 
Back
Top Bottom