I'm writing a python script which takes csv data from radiotimes (updated daily) and inserts it into a database.
Every channel has it's own csv file, for example 92.dat is the data for BBC 1. There are 386 channels.
Now I've designed my database schema like so
I've written the python code, it is extremely slow when it comes to inserting all of the program entries into the channel programmes table. I'm not sure if there's any way to make it quicker.
The basic process I'm following is
Every channel has it's own csv file, for example 92.dat is the data for BBC 1. There are 386 channels.
Now I've designed my database schema like so
Code:
CHANNELS
-----------------
CHANNELID | CHANNELNAME
GENRE
-----------------
GENREID | GENRENAME
EPISODE
-----------------
EPISODEID | PROGRAMMEID | SUBTITLE | EPISODE | DESCRIPTION
PROGRAMME
-----------------
PROGRAMMEID | GENREID | PROGTITLE | YEAR | DIRECTOR | PERFORMERS | FILM | WIDESCREEN | BNW | STARRATING | CERTIFICATE | DESCRIPTION
CHANNELPROGRAMMES
-----------------
CHANNELPROGRAMMEID | CHANNELID | PROGRAMMEID | EPISODEID | REPEAT | NEWSERIES | PREMIERE | CHOICE | SUBTITLES | DEAFSIGNED | DATE | STARTTIME | ENDTIME | DURATION
I've written the python code, it is extremely slow when it comes to inserting all of the program entries into the channel programmes table. I'm not sure if there's any way to make it quicker.
The basic process I'm following is
Code:
1. Tidy up previous days data
2. Download channels lookup table (csv data in the format 92|BBC 1)
3. Initialise list of channel files into progFiles dictionary (eg. 92.dat is the data for BBC1)
---Update channels table
---For Loop through channel list, insert channels if new, update channels if exists but changed
4. For loop through list of channel files (386 .dat files)
---Update database
---Open current csv dat file and load each line into a python dictionary (csv.DictReader)
---Append each entry to a new channelPrograms list
---For every line (record) in the channelPrograms list
---Update Genres table with new genres if necessary (each genre is unique)
---Update Programmes table for every new Programme (each programme is unique i.e. Lost)
---Update Episode table for every new episode of a programme (each episode is unique i.e. Lost episode 1, series 1)
---Delete entries from channel programmes table where date < today
---Insert record into channel programmes table (no unique records, i.e. a programme can be repeated)