Database table help

hargi · 29 Nov 2017 at 14:47

Looking for some help here.

I have 4 models
Image, Section, Page, Post

An image can belong to only one of the Section,Page or Post.
A Post could have multiple images.

First though was save a foreign_key id in the Image table however this wont work as a post can have an Id of 1 as can a Page

Second go was with help from people was to create a linktable with a look up
Tags
Section = 1
Page = 2
Post =3
Image = 4
___
ItemLookup
|id|parent_id|parent_tag|child_id|child_tag|

This way we can then add things at a later date live videos etc.

However have been told this is wrong.

New though is in the Image table add an parent_id and parent_type id fields. This is the only other solution I can see at the moment.

Thoughts?
Thanks

Randomface74 · 29 Nov 2017 at 15:09

What you did would work.
Your other suggestion would also work.

I can think of other ways to do it, but the real question is why did someone tell you it's wrong - and do you have to care what they think?

hargi · 29 Nov 2017 at 15:28

Thanks.
I was writing the code to return the child items with that solution when I was told that there is a better way.

PHP:
Because I have to do a query on 3 fields in the lookup table
eg

PHP:

'select' => 'child_item_id', 'where' => 'parent_item_id = :parent_item_id AND parent_tag = :parent_tag AND child_tag = :child_item_tag',

This will make things slow ?

In this instance I have to care about this opinion as this is a senior telling me

Randomface74 · 29 Nov 2017 at 15:43

It should be part of the senior's job to mentor you - not tell you you're wrong and to go ask OCUK lol. That person should be helping you.

The performance question depends...
- if you have a small number of records, who cares
- you would definitely need an index on your lookup table
- you're filtering on ints, so performance should be alright
- if you're finding the child item id and then using that as a parameter in another query, you can combine those queries into one

If your system is massive and performance really matters, I'd have three tables, like:

SectionImages
Id | SectionId | etc

So then your query would be:

SELECT Id, SectionId, etc
FROM SectionImages
WHERE SectionId = n

And make a covering index, filtering on SectionId.

There's a bit of a question about what's in your images table, is it blob data, paths, urls?

hargi · 29 Nov 2017 at 16:46

Thanks,

In my images table I am storing id, filename, original_filename, extension, alt_text, caption etc (from that information I can then build a path to where the file lives on the server)

The other reason was if you look at the database then it doesn't read nice. eg what is parent_item_tag 3 for example?

Yes I am doing one query to then use in another

PHP:

$childrenIds = ItemLinkage::find('all', array(
                'select' => 'child_item_id',
                'where' => 'parent_item_id = :parent_item_id
                      AND parent_item_tag = :parent_item_rag
                      AND child_item_type_id = :child_item_type_id'.

$childrenIds = array($childrenIds);

$images = Image::find('all', array(where => 'id in' =: $childrenIds);

They are big on OOP and best practices, extendability etc. This is for a very small project but because I know how they think I am trying to make it how I would for a larger one.

I see your solution with the extra 3 tables but then Im guessing I would get back what makes a SectionImage different to a PostImage whene the tables are identical except for the name and the xxx_id. And if we add another module eg notes etc we would have to add a NotesImages table as well

Randomface74 · 29 Nov 2017 at 18:51

If you think you're going to add more modules, then prioritise that.

If these are likely to be the only three you have, separate tables has the following advantages:
- foreign keys can be used
- performance is optimal (don't forget to make indexes, make sure you know what a covering index is for)

You could do it with images in one table and 3 link tables.

Images
Id | etc

Sections
Id | etc

Section_Images
Id | SectionId | ImageId

Performance isn't as good, but it's not awful.

Randomface74 · 29 Nov 2017 at 18:59

Code:

Regarding your 2 queries.

SELECT child_item_id 
FROM ItemLinkage
WHERE parent_item_id = @parent_item_id
AND parent_item_tag = @parent_item_tag
AND child_item_type_id = @child_item_type_id

SELECT *
FROM Images
WHERE Id IN ($childrenIds)

There's an overhead for each query and you're not using the results of the first query for anything else, so you'd combine as follows:

SELECT *
FROM Images
WHERE Id IN (
    SELECT child_item_id 
    FROM ItemLinkage
    WHERE parent_item_id = @parent_item_id
    AND parent_item_tag = @parent_item_tag
    AND child_item_type_id = @child_item_type_id
)

But if you went with my first idea it would be

SELECT *
FROM SectionImages
WHERE SectionId = @parent_item_id

And my second idea would be

SELECT *
FROM Images
WHERE Id IN (
    SELECT ImageId
    FROM Section_Images
    WHERE SectionId = @parent_item_id
)

Make sure at some point you read about the difference between IN and EXISTS if you haven't already.

AHarvey · 29 Nov 2017 at 22:07

If it was me, I'd have a table for each and use guid's as unique identifiers for each along with any other information pertaining to that item.

I'd then have a 5th table that just pulls them all together.

Image || Section || Page || Post
GUID || GUID || NULL || NULL
GUID || NULL || NULL || GUID

You'd just then use queries or joins to pull what information you needed

SELECT * FROM PAGE WHERE IMAGE_KEY = 'GUID'

You'd now see each page that image was on.

This is pretty much how we do it at work and we're talking billions of rows across 50+ databases

hargi · 30 Nov 2017 at 10:13

Thanks both of you.

I am unsure which to choose as all will work.

Can I ask @AHarvey will I get a Normalisation argument thrown back at me with your solution ?
I'm not questioning your way as I agree it will work no problem but we have a purist here.

AHarvey · 30 Nov 2017 at 10:30

hargi said:
Thanks both of you.

I am unsure which to choose as all will work.

Can I ask @AHarvey will I get a Normalisation argument thrown back at me with your solution ?
I'm not questioning your way as I agree it will work no problem but we have a purist here.

Good question and not something I was aware of as I'm not a dba or dev, just an analyst who works with the databases/tables day in day out.

Having a google brings back: "The goal of normalization in simple terms is just this: to store each fact in exactly one place. When you put each fact in only one place, you always know where to go to read it or write it."

I would imagine having a single table for each thing and then a single linking table would be the definition of Normalisation. If you need to see section info you got to the section table etc. I'm sure someone with more dba experience could could back and say either way.

Randomface74 · 30 Nov 2017 at 10:59

All of the suggestions here are denormalised. There's no duplication of data in any.

There is an argument against AHarvey's suggestion in that there's lots of duplication of the value NULL. Imagine you want to select GUID from any of those columns, you first have to discard all of the null values. I don't know how costly that is, but it isn't free.

Often when an app gets truly big you end up having to denormalise for performance anyway. The point is that joining tables has a cost - so yes you want less tables, but it's not less tables in your database, it's less tables in your query. This not only improves performance, but reduces the chance of blocking and deadlocks - which are nightmares to deal with.

Randomface74 · 30 Nov 2017 at 11:21

btw it's totally normal for developers to struggle with databases. Generally someone at the midrange level won't understand much more than creating tables, inserting and selecting.

The developers who make it to senior level should be able to design and optimise the database - and justify the decisions they make.

I recommend this book to any developer, but it's mandatory for anyone who wants a promotion from me:
https://www.amazon.co.uk/SQL-Server-Query-Performance-Tuning/dp/1430267437/

This is great for a developer's career as they become someone who can do things that most developers can't do - which is very valuable to a business.

The developers you work with sound a bit odd - chances are if you read this book you'll know more than they do - at the very least you'll be able to stick up for yourself.

AHarvey · 30 Nov 2017 at 12:38

billysielu said:
There is an argument against AHarvey's suggestion in that there's lots of duplication of the value NULL. Imagine you want to select GUID from any of those columns, you first have to discard all of the null values. I don't know how costly that is, but it isn't free.

Which I suppose could be counteracted by further splitting that table into several, one for images & pages, another for images & post and the last for images and images & section.

Then you'll have overhead of joining tables in queries, i guess it comes down to which overhead is smaller and more efficient. I'm 4 months into my sql position and loving it even though I often feel like the stupidest person in the room

Thanks for the book suggestion, I'm not a dev yet but have some experience in coding, going to see if I can get it tonight

3 years old, anything newer or is the info still good?

Randomface74 · 30 Nov 2017 at 12:56

I think the link I gave you is the latest version.

The version I read was a few years older, but still totally relevant today, I don't think this subject changes much.

hargi · 30 Nov 2017 at 14:01

Thank you, Will check out that book looks like it could be useful.
Yes basically from what I see I have to argue my reason for whatever I write. Even though I have never seen any of his code or him do any coding.

AHarvey · 30 Nov 2017 at 17:26

hargi said:
Thank you, Will check out that book looks like it could be useful.
Yes basically from what I see I have to argue my reason for whatever I write. Even though I have never seen any of his code or him do any coding.

Maybe that's the point.

Make your mind up on what you want to do, point out the negatives and positives of your choice compared to other options.

He wants to see you've put your thought into it.

Ergates · 30 Nov 2017 at 21:05

billysielu said:
The developers who make it to senior level should be able to design and optimise the database - and justify the decisions they make.

Caveat to that: Database developers should be able to do that. As a career long data guy, I've met plenty of talented and experienced Java (for example) developers who couldn't design even a simple database for ****.

Ergates · 30 Nov 2017 at 21:53

Can I ask @AHarvey will I get a Normalisation argument thrown back at me with your solution ?

Nothing wrong with a bit of denormalisation in an operational database. Normalisation is normally used for efficiency of space* and simplicity of maintenance. Operational databases are often denormalised to a certain degree to support maximum performance of the queries that are expected. Which, in a round about way, leads me to my point: The structure of an operational database/data mart, should be driven by how it will be queried**. You've not stated as much, but it seems likely that when using this database you'll be going in with a section/page/post ID and retrieving the relevant images [?], and it seems *unlikely* that you'd be [regularly] trying to retrieve all the image details for a given page AND a given section (etc) at the same time. With that in mind billysielu's suggestion involving 3 separate tables seems best. Think of each of the 3 different queries (images for a page, images for a post, images for a section) as different operations with different (but co-located) databases underneath them.

If it was a many to many relationship (each page/section/post can have many images, each image can be on many pages/sections/posts) then bridge tables would be the way to go.

* Space here isn't just a factor of storage but also a factor of I/O when you're shunting data about.

** Though, obviously, this is a balancing act with efficiency of storage and efficiency of load.

gpuerrilla · 1 Dec 2017 at 00:46

If you are half decent at normalising then you will be fine when it comes to DB design. I think 4NF is going above and beyond for most small scale projects. I remember this being a contextual part in our lectures some 17 years ago.. but I digress.

Ergates · 1 Dec 2017 at 10:07

If you are half decent at normalising then you will be fine when it comes to DB design

I'd politely suggest that there is a little more to good db design than just normalisation. Having said that, I will accept that for a lot of cases, having a db design that is just about OK is probably sufficient.