T-SQL - Column is NULL, or False. Which to use?

Psy-Blade · 1 May 2013 at 11:22

Hi Guys,

Just part way through writing some t-sql and I've created a bit column 'Approved' I auto import data into this column with a NULL value for approved.

Just wondering if this is proper in this case, as my WHERE statements then have 'approved IS NULL', or should I be importing as False and coding appropriately?

It'll work either way I know, but just one of those questions that's bugging me.

Thanks in advance guys.

Pho · 1 May 2013 at 11:53

I'd say import it as false. A null value for approved is surely the same as false is it not (e.g., it's not approved)? Do you actually need it to be null for any reason?

ZombieFan · 1 May 2013 at 11:56

Well, that depends

NULL should indicate that the value is unknown. So if you know that they are Approved then you should us 1, if you know they are Not Approved you should use 0. If you don't know if they are approved or not then you should use NULL.

If you will always know if they are approved or not (i.e. if everyone/thing defaults to Not Approved until Approved) then you should set the column to non-nullable and set the value to 0.

Shad · 2 May 2013 at 08:01

ZombieFan said:
Well, that depends

NULL should indicate that the value is unknown. So if you know that they are Approved then you should us 1, if you know they are Not Approved you should use 0. If you don't know if they are approved or not then you should use NULL.

If you will always know if they are approved or not (i.e. if everyone/thing defaults to Not Approved until Approved) then you should set the column to non-nullable and set the value to 0.

Exactly this. It's a tri-state attribute, although not all data access layers will treat it as such. You could change your WHERE criteria to look for approved = 1 or approved <> 1, which might simplify things for you.

Gornall · 2 May 2013 at 08:29

WHERE criteria to look for approved = 1 or approved <> 1

Beware if you do have Nulls then using this would not bring them up at all.

You would need and "Or Is Null" criteria if you want them to come through to view.

Dj_Jestar · 2 May 2013 at 15:58

null indicates no value, not unknown value. Semantic (and pedantic) difference.

tom_e · 2 May 2013 at 16:01

I'd set it as true/false just makes more sense plus NULL's can be a bit more difficult to deal with depending on what you're planning on doing with the data.

Psy-Blade · 3 May 2013 at 10:58

Thanks for all the replies guys.

I was swaying towards making everything either FALSE or TRUE and, to be honest, setting the column to non-nullable completely slipped my mind when I was posting this. That makes the most sense to me.

As I said, I just needed some justification for either method as I know it would have worked either way. Something to keep in mind for future

Imy · 3 May 2013 at 11:15

Just whatever you do, stay consistent across all your tables!

Weazle · 3 May 2013 at 11:32

For flag field I normally use integers, 0 for FALSE and 1 for TRUE.

Dependent on DB engine there are BOOLEAN datatypes which are TRUE/FALSE (sometimes 0/1) or you can use BIT (0/1).

One of the neat things about some DB engines is when using a BIT or INTEGER datatype is using them in if/while statements in code, they can treat 0 as FALSE and any value above that is TRUE. Also, some coding languages use 0/1 as TRUE/FALSE as well which makes it quite handy without conversions.

eg.
_tempvar = 1;
if (_tempvar) {
-- TRUE
} else {
-- FALSE
}

dasyad · 3 May 2013 at 16:56

BIT wasteful to store just 0's or 1's as int's

aln · 3 May 2013 at 20:32

You sure you want to go with just true/false?

Personally I'd assume you'd at least want not-processed, rejected, approved. You could do that with null, true, false, or -1, 0, 1, etc.

Dj_Jestar · 3 May 2013 at 21:14

Or .. just to be completely crazy.. "pending", "rejected" and "approved"

sniffy · 3 May 2013 at 21:16

aln said:
You could do that with null, true, false

Thank you Jestar.

aln · 3 May 2013 at 21:50

Dj_Jestar said:
Or .. just to be completely crazy.. "pending", "rejected" and "approved"

While it probably largely doesn't matter, using strings isn't exactly the most efficient use of a database, which is why I didn't suggest it. But yeah, that'll work as well. Then again, I haven't actually got any clue about T-SQL.

sniffy · 4 May 2013 at 00:39

aln said:
While it probably largely doesn't matter, using strings isn't exactly the most efficient use of a database, which is why I didn't suggest it.

True but the performance difference is marginal for most real-world systems. Null's meaning is totally ambiguous to anyone other than the person who wrote the code that relies it. Even worse someone might assume that it means no value. You could map a constant in the code to the value but the database in isolation holds misleading data. Unless speed is a top issue I believe clarity is always king. This is true even if you don't work in a team.

Dj_Jestar · 4 May 2013 at 03:13

As above. Premature optimisation. Descriptive values until performance becomes an issue. Do not assume it to ever be the other way around. Why?

In short: it won't ever become a problem.

In long: it won't until you are hitting millions of transactions per hour and you will have countless far better optimisations to undertake long before this becomes an issue.

aln · 4 May 2013 at 11:30

My understanding of premature optimisation is generally refering to technical debt, for example jumping on the noSQL bandwagon for wins that probably aren't there, which is going to take your typical rdbms team of devs a significant amount of effort. Doing something which is typically easy to do in your typical path of development doesn't really fall into the same criteria.

You're right that it probably won't matter in the real world, I admitted as much in my previous post, but I don't entirly agree that it's unambiguous. The null would only exist before the system actually attempted to process anything, thus having no makes some sense. Histoically I'd go with the numbers though, which can be commented in databases like MySQL but that's me living within someone elses convention anyway.

Regardless, my point wasn't really to defend any singular method of storing the data, it was to back up the argument that you probably want 3 states as opposed to 2, which is the more important issue. I don't care how you store those dates, but I will care whether or not I can tell if somethings failing, or if somethings never happening. Obviously it depends on specifics though.

Dj_Jestar · 4 May 2013 at 14:21

That's one form of PO, yes. That's not what I think would be called technical debt, though. Using NoSQL or whatever flavour of the month "just because that's what the cool kids are doing" should have its own name like "fanboism" or something. Perhaps "CV Driven Development." Technical debt is when you accrue bad design for the sake of producing a product quickly/cheaply. It (the current design) will hinder development in the future, but allow you to push the current stuff under development out quicker. There are two types of TD - the stuff you know is bad, but you want to get something out the door ASAP, and then there is the stuff you just don't know about at the time and won't discover until later. Typically the latter is when there is a disconnect between the business and the developer - the develop doesn't know what is to come, so makes a design decision that may not suit the upcoming work.

Premature optimisation is very specifically about worrying about small performance gains where you sacrifice readability/simplicity for the sake of a few CPU cycles or such, and/or when you are trying to solve a problem you don't even have. What it says on the tin: it is about prematurely optimising.

But yeah, using strings is not going to make any measurable difference to the responsiveness of the system. The only time that kind of optimisation comes into it is when you are rapidly accessing/indexing millions of rows per hour. Besides which, the conversion/comparison to a string (i.e. the readable bit of descriptive text for your users to read - using "null, 1, 0" or other is going to mean nothing to them) will soak up most if not all of the gains anyway.

aln · 4 May 2013 at 15:39

Dj_Jestar said:
That's one form of PO, yes. That's not what I think would be called technical debt, though. Using NoSQL or whatever flavour of the month "just because that's what the cool kids are doing" should have its own name like "fanboism" or something. Perhaps "CV Driven Development." Technical debt is when you accrue bad design for the sake of producing a product quickly/cheaply. It (the current design) will hinder development in the future, but allow you to push the current stuff under development out quicker. There are two types of TD - the stuff you know is bad, but you want to get something out the door ASAP, and then there is the stuff you just don't know about at the time and won't discover until later. Typically the latter is when there is a disconnect between the business and the developer - the develop doesn't know what is to come, so makes a design decision that may not suit the upcoming work.

I don't know, excusing my poor choice of words, I would suggest CV Driven Development (that's a great term) and PO both leads to the same bad decisions which will be refered to as technical debt at some point in the future, of course this doesn't encompass the entire scope of TD, nor does TD cover the wasted development time, but we'll generally end up in the same place while we support flavour of the month that nobody really understands. In reality if it didn't take longer and doesn't become technical debt, it's actually not much of an issue in my opinion.

That said, I don't participate in CVDD and I often wonder if I haven't done the wrong thing not padding my CV because most job adverts seems to be based around such fads. I often see jobs requires 4 technologies I've never actually used because I was busy doing my job, but the guy who sats across from me constantly using new tech and comitting broken code every day probably is 'qualified'. =D

Dj_Jestar said:
But yeah, using strings is not going to make any measurable difference to the responsiveness of the system. The only time that kind of optimisation comes into it is when you are rapidly accessing/indexing millions of rows per hour. Besides which, the conversion/comparison to a string (i.e. the readable bit of descriptive text for your users to read - using "null, 1, 0" or other is going to mean nothing to them) will soak up most if not all of the gains anyway.

While that's true, it'll likely be done on a different piece of the stack that is traditionally easier to scale.