Best method for constant data collection.

~J~

~J~

Soldato
Joined
20 Oct 2003
Posts
7,558
Location
London
Any thoughts on the best way (in terms of SQL Server streamlining) for this little issue.

Got a program that at the moment sends the following every 30 seconds from the program to the SQL Server:

1,###,GetDate()
2,###,GetDate()
3,###,GetDate()

Where ### is a number between 000 and 999.

This sequence of numbers can be sent from about 50 machines around the UK, and every 30 seconds it's always 3 records.

In short this means that at the end of, say, today, I'll have:

6 records per machine every minute (300 records),
360 records per machine every hour (18,000 records),
8640 records per machine every day (432,000 records).

And all this data needs to be saved month to month which is about 259,200 records per machine or a nice beefy 12,960,000 records for all machines per month!!!

What I was therefore thinking was, to reduce this number by a third and have a field like:

1=###|2=###|3=####, GetDate()

Because GetDate would be the same for all 3 records anyway, and the 3 individual records could be entered into one field where I can Split() them at a later date to acquire historic data.

Rather than a:

SELECT * FROM HistoricData WHERE [Field2]=###

I could have:

SELECT * FROM HistoricData WHERE [NewField] LIKE '%2=###%'

Would saving the data my new proposed way have much of an impact on the Server or should we be looking at any other ways of saving this data?
 
I'm pretty sure running a LIKE operation would be slower in the long run, even with less records.

Is there any way you could archive the records to a different table on a monthly basis? That way you only have a maximum of 13m records to deal with and can index appropriately.
 
Well yes, I'll be definately archiving all the records either into another table or have some automated procedure that extracts the necessary records per-machine and saves them as an Excel file.

But running the LIKE command, while I appreciate will be slower, would probably only be called, say, every couple of hours on an adhoc basis when a report is run.

What bothers me the most is I have either several million records where there are 4 fields per row (3 INT fields and 1 DateTime) or a third of the size and have 3 fields per row (1 INT for a UID, 1 NVARCHAR(20) and DateTime). The latter using slightly more processing power with it storing the NVARCHAR's and indexing on that could be a slight issue.

At present there's 50 machines around the UK, but we're generally installing about 2-3 machines a month so this figure is likely to double by this time next year.
 
Would it not be best to split it up to a separate field for each bit of data:

UID, DateTime, Msg1, Msg2, Msg3

Then you can do:

SELECT * FROM HistoricData WHERE [Msg2] = '###'
 
Would it not be best to split it up to a separate field for each bit of data:

UID, DateTime, Msg1, Msg2, Msg3

Then you can do:

SELECT * FROM HistoricData WHERE [Msg2] = '###'

Well that was actually the original question, that's what I have at present, but each month I'm going to have nearly 13,000,000 records.

Was just wondering if, by consolodating the 3 fields into one whether there'd be an impact with the 3 INT fields being changed into one NVARCHAR field (when it comes to indexing).
 
Going by what you just said, if you consolidate 3 fields into 1 you will still have the same number of rows as you had originally, but now your searching will be slower?

you want to consolidate your 3 rows into 1 row with 3 fields.

Rows:
1,###4,GetDate()
2,###5,GetDate()
3,###6,GetDate()

becomes

###4, ###5, ###6, getdate()
 
Going by what you just said, if you consolidate 3 fields into 1 you will still have the same number of rows as you had originally, but now your searching will be slower?

you want to consolidate your 3 rows into 1 row with 3 fields.

Rows:
1,###4,GetDate()
2,###5,GetDate()
3,###6,GetDate()

becomes

###4, ###5, ###6, getdate()

Nope I won't, I'll have a 3rd less rows.

1,###4,GetDate()
2,###5,GetDate()
3,###6,GetDate()

becomes

1=###4|2=###5|3=####, GetDate()

When it comes to reporting, I can use a Split() command at the Pipe ( | ) and/or do a LIKE '%1=####|%' to search within the string.


*edit*

Although as you've also said, using the "###4, ###5, ###6, getdate() " method will reduce things by a 3rd, but also should improve SQL indexing as each of those fields would just contain a INT rather than the NVARCHAR.

Didn't even see that option!! Cheers :)
 
That's what I said, only I had the "UID" bit as well - maybe I was getting confused with your "UID" bit - I was guessing it was required but it may well not be.
 
Back
Top Bottom