Help with database table design - it's been a while!!!!

Soldato
Joined
1 Mar 2003
Posts
5,508
Location
Cotham, Bristol
Ok theory is this should be quite a simple design, but I'd like to check with those in the know. I have some XML data I'm wanting to insert into a MySQL database.

There are two types of xml file, channels.xml and xx.xml (where xx is the channel id)

channels.xml has entries like this (it's basically a listing of all the tv channels available and their id)

PHP:
<CHANNEL>
   <CHANNELNAME>Five USA</CHANNELNAME>
   <CHANNELID>2008</CHANNELID>
</CHANNEL>

So the corresponding xml file for this channel would be 2008.xml, which has entries like so

PHP:
<PROGRAMME>
   <CERTIFICATE></CERTIFICATE>
   <DURATION>360</DURATION>
   <STARTTIME>06:00</STARTTIME>
   <DATE>27/09/2010</DATE>
   <SUBTITLE></SUBTITLE>
   <FILM>false</FILM>
   <PERFORMERS></PERFORMERS>
   <WIDESCREEN>false</WIDESCREEN>
   <ENDTIME>12:00</ENDTIME>
   <SUBTITLES>false</SUBTITLES>
   <REPEAT>false</REPEAT>
   <EPISODE></EPISODE>
   <PREMIERE>false</PREMIERE>
   <BNW>false</BNW>
   <STARRATING></STARRATING>
   <PROGTITLE>Teleshopping</PROGTITLE>
   <DIRECTOR></DIRECTOR>
   <YEAR></YEAR>
   <DEAFSIGNED>false</DEAFSIGNED>
   <CHOICE>false</CHOICE>
   <NEWSERIES>false</NEWSERIES>
   <DESCRIPTION>A chance to buy goods from the comfort of home.</DESCRIPTION>
   <GENRE>Consumer</GENRE>
</PROGRAMME>

So based on this information I plan to have the following tables

CHANNELS

CHANNELID | CHANNELNAME

GENRE

GENREID | GENRENAME


PROGRAMMES

PROGRAMMEID | CHANNELID | GENREID | PROGTITLE | SUBTITLE | EPISODE | YEAR | DIRECTOR | PERFORMERS | PREMIERE | FILM | REPEAT | SUBTITLES | WIDESCREEN | NEWSERIES | DEAFSIGNED | BNW | STARRATING | CERTIFICATE | DESCRIPTION | CHOICE | DATE | STARTTIME | ENDTIME | DURATION

Should I normalise out into further tables or does that sound ok?
 
Last edited:
Normalise further, there probably need to be an episode table, and also (if your having a repeat field) a separate table for each "episode Airing", a separate table for Genre.
 
ok thanks, I spotted the genre and amended. So so far we have

CHANNELS

CHANNELID | CHANNELNAME

GENRE

GENREID | GENRENAME

PROGRAMME

PROGRAMMEID | GENREID | PROGTITLE | SUBTITLE | EPISODE | YEAR | DIRECTOR | PERFORMERS | PREMIERE | FILM | REPEAT | SUBTITLES | WIDESCREEN | NEWSERIES | DEAFSIGNED | BNW | STARRATING | CERTIFICATE | DESCRIPTION | CHOICE

CHANNELPROGRAMMES

CHANNELPROGRAMMEID | CHANNELID | PROGRAMMEID | DATE | STARTTIME | ENDTIME | DURATION

That way every programme has it's own entry and in theory shouldn't be duplicated in the table, whereas the channel programmes table can have multiple entries for the same programme at different start times? Make sense?
 
Repeat should be in the the Channel Programmes also deafsigned? , Also the star rating probably need to be stored in another table (or at least calculated using another one)

You could also split out (Peformers) (Director) and (Episode) if you wanted to get detailed.
 
Last edited:
hmmm could do couldn't I, although I'm not sure what I would do when you have actors of the same name who are actually different people, there wouldn't be any way to know.
 
hmm, where have you ripped your data from, is it possible to perhaps get more info?

If you could get the id they use perhaps?

Or is this all the info available, if so you would have to skip that yeah.
 
Last edited:
radiotimes data, there's a publicly available csv feed of data that gets updated every morning. Think I'll leave this level of detail out, it's not really necessary.
 
Just one thing to note - if the database is going to be containing a lot of data, I would not recommend going any where near third normal form.

Cross table joins are a very inefficient way of grabbing information from the database and will cause massive scalability issues if you ever need the system to grow. Performance will suffer at all levels having to do joins versus simple selects.

De-normalising your data can often help. (e.g. make Genre included in Programme, rather than a separate Genre table, the same with channels)
 
well the channel programmes table will get emptied out every day as there's no sense in keeping previous days programmes.

The programme table however will continue to grow
 
Just one thing to note - if the database is going to be containing a lot of data, I would not recommend going any where near third normal form.

Cross table joins are a very inefficient way of grabbing information from the database and will cause massive scalability issues if you ever need the system to grow. Performance will suffer at all levels having to do joins versus simple selects.

De-normalising your data can often help. (e.g. make Genre included in Programme, rather than a separate Genre table, the same with channels)

While what you say is true and joins are expensive the data should be stored in correctly normalised tables and perhaps a "flat Table" should be created containing all the information "compiled" from the various other tables. What if, for example you wanted to rename a genre.

Data integrity should be kept in a consistent and structured state regardless of "implementation" worries. The amount of data for this project is probably not going to be enough to make the mysql performance and issue anyway. Mine still works fine and i'm joining a table with 1mil rows with one with 400,000 that then self joins around 20 times with no performance issues, well not really :P ^_^
 
Back
Top Bottom