SQL Help!

PMKeates · 28 Sep 2012 at 17:15

Hi,

I would like to update a table but am not sure of the most effecient way to do it for this scenario:

What I want to do is set column 4 = II where it's shaded grey.

The logic behind it is to update records where for a set combination of column 2 and column 3 there are zero records with column 4 = II.

shine · 28 Sep 2012 at 17:23

Something like this should work:

Code:

update table_name
set column4= "II"
where column2 in ("b", "d")
and column3 in ("x","y","z")
and column4 != "II"

Obviously create a backup of your data first!

HungryHippos · 28 Sep 2012 at 19:30

I would first ensure I could run a sanity check with a SELECT statement:

Code:

SELECT
Column1,
Column2,
Column3,
Column4
FROM TableName
WHERE Column2 in ("b", "d")
AND Column3 in ("x","y","z")
AND Column4 != "II"

After confirming that, you can run the update. I would wrap it in a BEGIN TRANSACTION though, and after confirming the change either COMMIT or ROLLBACK depending on if the change looked successful or not.

Haircut · 29 Sep 2012 at 07:34

The two posts so far don't do what the OP has asked.
It specifies

update records where for a set combination of column 2 and column 3 there are zero records with column 4 = II

But the statements shown will take into account all records for the set combinations of column 2 and 3 where column 4 is not II, which is a subtly different thing.

I think we need to know exactly what the combinations are as well as the OP isn't clear on this.
From the wording I assume that it's just the value of column 4 that is the key, i.e. whatever is in columns 2 and 3, then if, for the combination of values, there is no entry with a II in column 4 then do the update.
The posted statements seem to assume that only the Cartesian product of (b, d) and (alpha, beta, sigma) are to be taken into account.

A bit more clarity on this is needed, but if all combinations are OK then the SQL would look like this:

Code:

select *
from my_table t
where not exists
(select 1
 from my_table t2
 where t.column2 = t2.column2
 and t.column3 = t2.column3
 and t2.column4 = 'II'
)

Obviously what I have posted will identify the records, but the update statement should be trivial to create from there.

topdog@OC · 29 Sep 2012 at 13:53

I believe in the case where there is a pair of records, one with column4 = 'I' and one with column4 = 'II' that you would end up updating the 'I' one incorrectly.

OP can confirm, but correct me if I'm wrong, there is only meant to be one record out of any combination of column2 and column3 where column4 should be set to 'II', or else just exclude those completely if column4 for any single one is already set to 'II'.

topdog@OC · 29 Sep 2012 at 14:04

Edit: deleting alternate solution, doesn't appear to be necessary as haircut's seems to have the correct logic in it - I misread one of its clauses first time.

HungryHippos · 29 Sep 2012 at 14:13

In reality it might be simpler to construct a few update statements to cover all of the possibilities, the shaded grey rows above only offer 3 possibilities, and writing update statements for these individually might be simpler than composing a more complex logic.

Haircut · 29 Sep 2012 at 18:12

topdog@OC said:
I believe in the case where there is a pair of records, one with column4 = 'I' and one with column4 = 'II' that you would end up updating the 'I' one incorrectly.

OP can confirm, but correct me if I'm wrong, there is only meant to be one record out of any combination of column2 and column3 where column4 should be set to 'II', or else just exclude those completely if column4 for any single one is already set to 'II'.

If you're referring to my query, then it wouldn't do that.
The subquery will pick up anything where there is a II in column4 for a particular set of column2 and column3 values.
The where not exists bit ensures that only the groups that don't conform to this will get picked up, i.e. the ones where there is no record of II in column4 for a particular combination of column2 and column3 values.

topdog@OC · 29 Sep 2012 at 20:23

Edit: Just tried it out - you're right, for some reason when I first saw your query I suspected it would include row 3 accidentally. Sorry

Randomface74 · 29 Sep 2012 at 20:34

Code:

UPDATE [TableName] SET [column 4] = "II" 
WHERE [column 2] = @Column2 AND [column 3] = @Column3
AND NOT EXISTS (SELECT [column 1] FROM [TableName] WHERE [column 2] = @Column2 AND [column 3] = @Column3 AND [column 4] = "II")

PMKeates · 30 Sep 2012 at 12:20

Hi all

Thanks for the feedback!

To clarify, there are 100m rows in this table, with around 50k unique values in column 2, 10k unique values in column 3 (though not all are present for every column 2 value, as in the example, and there's no pattern) and a dozen unique values for column 4! The table is keyed on column 1, and there can be valid duplications of column 2 + 3 + 4 - the column 1 key joins to another table with further information that allows you to resolve the duplicates, but that's not important for now. This is a 20 year old database structure with tens of billions of records in total (hence why it's still around!), so don't blame me for it

That all unfortunately means the queries with IN ('x', 'y') etc. are out of the question - I really need the logicality of it rather than the absolute according to the table I showed in the opening post.

Haircut said:
From the wording I assume that it's just the value of column 4 that is the key, i.e. whatever is in columns 2 and 3, then if, for the combination of values, there is no entry with a II in column 4 then do the update.

Absolutely correct - the values in column 2 and 3 are effectively irrelevant, it's just their combination that's important - if for that combination there is no column 4 value equal to 'II', then set column 4 equal to 'II' where it is 'I' for that combination.

Looking at your SQL, could I simply set column 4 = 'II' where column 4 = 'I' and the column 1 from your select query?

Cheers!

Haircut · 30 Sep 2012 at 13:08

PMKeates said:
Looking at your SQL, could I simply set column 4 = 'II' where column 4 = 'I' and the column 1 from your select query?

Pretty much, the update statement would be

Code:

update my_table
set column4 = 'II'
from my_table t
where not exists
(select 1
 from my_table t2
 where t.column2 = t2.column2
 and t.column3 = t2.column3
 and t2.column4 = 'II'
)