SQL Select only most recent records

Soldato
Joined
25 Mar 2004
Posts
16,007
Location
Fareham
Hi All,

Sorry for the long post but I feel it needs all the info present!

Having a spot of bother with my SQL query, Basically I have two tables:

Users - contains all user general information, primary Key is the msExchMailboxGuid value. There are 59830 vaues in this table.
UserStats - contains user stats such as mailbox size, total number of items etc. Links to the Users table via the MailboxGuid value. There may be multiple values for each User which are differentiated by the date column. There may also be no values at all for a user.

More info on what columns I have are below in my query, all columns are listed.

What I want to do

For each record in the Users table, I want to retrieve the most recent date value from the UserStats table where the guid matches. If there is no value returned I want to return NULL values for the columns.

So in the end I would expect to see 59830 values returned by my query, one row for each msExchMailboxGuid value in the Users table. This would contain the most recent values from the UserStats table in the event that user has any values to return, and if there are no values to return it would still have the record but it would contain NULL values.

Query so far

This is what I have come up with so far, doesn't seem to work properly though, I get 54961 results instead of the expected 59830 records:

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit,
u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp,
u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, s.MailboxGuid, s.Date, s.AssociatedItemCount, s.DeletedItemCount, s.ItemCount,
s.LastLoggedOnUserAccount, s.LastLogonTime, s.StorageLimitStatus, s.TotalDeletedItemSize, s.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
UserStats AS s ON u.msExchMailboxGuid = s.MailboxGuid
WHERE (s.Date =
(SELECT MAX(Date) AS Max_Date
FROM UserStats
WHERE (MailboxGuid = u.msExchMailboxGuid)))

Example

To keep this simple, take this as an example with reduced number of columns:

Users Table:

msExchMailboxGuid, DisplayName
2bc361a2-d681-4e2e-816d-c297575f9954,Joe Bloggs
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,Green Fingers
2bca5973-1b2f-42c2-930b-ec1f00530b78,Red Mamba

UserStats Table:

MailboxGuid, Date, ItemCount
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-09 18:00:00.000,1200
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-08 18:00:00.000,1100
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-07 18:00:00.000,1000
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,2011-07-09 18:00:00.000,500
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,2011-07-08 18:00:00.000,400

The first two records in the Users table have multiple results, the last one has none, this is what I would like to return, note the last record I want NULL values for:

msExchMailboxGuid, DisplayName, Date, ItemCount
2bc361a2-d681-4e2e-816d-c297575f9954,Joe Bloggs,,2011-07-09 18:00:00.000,1200
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,Green Fingers,2011-07-09 18:00:00.000,500
2bca5973-1b2f-42c2-930b-ec1f00530b78,Red Mamba,NULL,NULL

Many thanks!
 
select u.*, us.*
from users u
left join
(
select mailboxguid, max(date) as date
from userstats us
group by mailboxguid
) us_filter
on u.msexchmailboxguid = us_filter.mailboxguid
left join userstats us
on us_filter.mailboxguid = us.mailboxguid
and us_filter.date = us.date

From that you can put into your row any columns from either users or userstats (instead of what I put), and it will contain every record from users, and data from userstats if it exists otherwise nulls, from the matching row with the most recent date.

The only downfall of this approach is if it's possible for userstats to contain the same date/time value for the same mailbox as then you'd end up with duplicate records in the results. If there's a unique ID field from userstats you could use instead to join us_filter to userstats, that would solve it (and you'd just have to make a logical decision as to which one you want, probably the max(unique_id) field in that case).

I would hope that mailboxguid and date from userstats are an indexed item, though for 50k records it will still probably perform well enough regardless for now.

You can add a WHERE clause at the end for any additional filtering you want to do on the result set.
 
Thanks topdog!

Seems to run a little quicker than my current query which now works with the addition of the OR s.date is NULL entry in the where query:

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit, u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp, u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, s.MailboxGuid, s.Date, s.AssociatedItemCount, s.DeletedItemCount, s.ItemCount, s.LastLoggedOnUserAccount, s.LastLogonTime, s.StorageLimitStatus, s.TotalDeletedItemSize, s.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
UserStats AS s ON u.msExchMailboxGuid = s.MailboxGuid
WHERE (s.Date = (SELECT MAX(Date) AS Max_Date FROM UserStats WHERE (MailboxGuid = u.msExchMailboxGuid))
OR s.date is NULL)

So I am kind of understanding what you have done, the main thing is this works (just your query but I took it into the Query Designer so I could get a better understanding of it):

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit,
u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp,
u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, us.MailboxGuid, us.Date, us.AssociatedItemCount, us.DeletedItemCount,
us.ItemCount, us.LastLoggedOnUserAccount, us.LastLogonTime, us.StorageLimitStatus, us.TotalDeletedItemSize, us.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
(SELECT MailboxGuid, MAX(Date) AS date
FROM UserStats AS us
GROUP BY MailboxGuid) AS us_filter ON u.msExchMailboxGuid = us_filter.MailboxGuid LEFT OUTER JOIN
UserStats AS us ON us_filter.MailboxGuid = us.MailboxGuid AND us_filter.date = us.Date

I added some indexes on the advice of the sql server profiler:

use [ServerName]
go

CREATE CLUSTERED INDEX [_dta_index_UserStats_c_5_309576141__K1] ON [dbo].[UserStats]
(
[MailboxGuid] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
go

CREATE NONCLUSTERED INDEX [_dta_index_UserStats_5_309576141__K1_2] ON [dbo].[UserStats]
(
[MailboxGuid] ASC
)
INCLUDE ( [Date]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
go

CREATE STATISTICS [_dta_stat_373576369_1] ON [dbo].[Users]([objectGuid])
go

Query now takes about 2 seconds to return all the results, will keep an eye on this as the userstats table grows in size might need to consider archiving the data each day into another table instead of adding to the main table.
 
Last edited:
If that's 2 seconds total including fetch, then quite fair. If it's 2 seconds on execution alone, something is probably bogging it down, I'd be expecting a number around or under 0.5s for execution on an uncached query like this, depending on server and IO.
 
The results grid says it took 00:00:02 to execute the query and returned all 59838 records (number has gone up a little due to new users being added). I'm not sure if this is considered the execution time or the fetch time?

Is this bad or good? SQL server is a VM with 4GB of RAM and 2 CPU's.

Server doesn't seem to be under load, it literally just has these 2 tables running at the moment in the database and i'm just using this for testing at the minute. CPU usage floating around 0-1% and physical memory usage around 73%. Most of the memory usage seems to be used by sqlservr.exe process which is using just under 2GB of RAM, though I would not say it was out of the ordinary for SQL to use as much memory as it can really.
 
Back
Top Bottom