SQL Select only most recent records

Soldato
Joined
25 Mar 2004
Posts
16,007
Location
Fareham
Hi All,

Sorry for the long post but I feel it needs all the info present!

Having a spot of bother with my SQL query, Basically I have two tables:

Users - contains all user general information, primary Key is the msExchMailboxGuid value. There are 59830 vaues in this table.
UserStats - contains user stats such as mailbox size, total number of items etc. Links to the Users table via the MailboxGuid value. There may be multiple values for each User which are differentiated by the date column. There may also be no values at all for a user.

More info on what columns I have are below in my query, all columns are listed.

What I want to do

For each record in the Users table, I want to retrieve the most recent date value from the UserStats table where the guid matches. If there is no value returned I want to return NULL values for the columns.

So in the end I would expect to see 59830 values returned by my query, one row for each msExchMailboxGuid value in the Users table. This would contain the most recent values from the UserStats table in the event that user has any values to return, and if there are no values to return it would still have the record but it would contain NULL values.

Query so far

This is what I have come up with so far, doesn't seem to work properly though, I get 54961 results instead of the expected 59830 records:

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit,
u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp,
u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, s.MailboxGuid, s.Date, s.AssociatedItemCount, s.DeletedItemCount, s.ItemCount,
s.LastLoggedOnUserAccount, s.LastLogonTime, s.StorageLimitStatus, s.TotalDeletedItemSize, s.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
UserStats AS s ON u.msExchMailboxGuid = s.MailboxGuid
WHERE (s.Date =
(SELECT MAX(Date) AS Max_Date
FROM UserStats
WHERE (MailboxGuid = u.msExchMailboxGuid)))

Example

To keep this simple, take this as an example with reduced number of columns:

Users Table:

msExchMailboxGuid, DisplayName
2bc361a2-d681-4e2e-816d-c297575f9954,Joe Bloggs
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,Green Fingers
2bca5973-1b2f-42c2-930b-ec1f00530b78,Red Mamba

UserStats Table:

MailboxGuid, Date, ItemCount
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-09 18:00:00.000,1200
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-08 18:00:00.000,1100
2bc361a2-d681-4e2e-816d-c297575f9954,2011-07-07 18:00:00.000,1000
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,2011-07-09 18:00:00.000,500
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,2011-07-08 18:00:00.000,400

The first two records in the Users table have multiple results, the last one has none, this is what I would like to return, note the last record I want NULL values for:

msExchMailboxGuid, DisplayName, Date, ItemCount
2bc361a2-d681-4e2e-816d-c297575f9954,Joe Bloggs,,2011-07-09 18:00:00.000,1200
2bc6af6f-96fd-47a1-b97d-4b50b4862e12,Green Fingers,2011-07-09 18:00:00.000,500
2bca5973-1b2f-42c2-930b-ec1f00530b78,Red Mamba,NULL,NULL

Many thanks!
 
Thanks topdog!

Seems to run a little quicker than my current query which now works with the addition of the OR s.date is NULL entry in the where query:

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit, u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp, u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, s.MailboxGuid, s.Date, s.AssociatedItemCount, s.DeletedItemCount, s.ItemCount, s.LastLoggedOnUserAccount, s.LastLogonTime, s.StorageLimitStatus, s.TotalDeletedItemSize, s.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
UserStats AS s ON u.msExchMailboxGuid = s.MailboxGuid
WHERE (s.Date = (SELECT MAX(Date) AS Max_Date FROM UserStats WHERE (MailboxGuid = u.msExchMailboxGuid))
OR s.date is NULL)

So I am kind of understanding what you have done, the main thing is this works (just your query but I took it into the Query Designer so I could get a better understanding of it):

SELECT u.objectGuid, u.msExchMailboxGuid, u.CompanyName, u.ResellerOU, u.DisplayName, u.MBXServer, u.MBXSG, u.MBXDB, u.MBXWarningLimit, u.MBXSendLimit,
u.MBXSendReceiveLimit, u.extensionAttribute10, u.legacyExchangeDN, u.UserPrincipalName, u.Mail, u.lastLogonTimestamp, u.createTimeStamp,
u.modifyTimeStamp, u.altRecipient, u.altRecipientBL, u.Deleted, u.DeletedDate, us.MailboxGuid, us.Date, us.AssociatedItemCount, us.DeletedItemCount,
us.ItemCount, us.LastLoggedOnUserAccount, us.LastLogonTime, us.StorageLimitStatus, us.TotalDeletedItemSize, us.TotalItemSize
FROM Users AS u LEFT OUTER JOIN
(SELECT MailboxGuid, MAX(Date) AS date
FROM UserStats AS us
GROUP BY MailboxGuid) AS us_filter ON u.msExchMailboxGuid = us_filter.MailboxGuid LEFT OUTER JOIN
UserStats AS us ON us_filter.MailboxGuid = us.MailboxGuid AND us_filter.date = us.Date

I added some indexes on the advice of the sql server profiler:

use [ServerName]
go

CREATE CLUSTERED INDEX [_dta_index_UserStats_c_5_309576141__K1] ON [dbo].[UserStats]
(
[MailboxGuid] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
go

CREATE NONCLUSTERED INDEX [_dta_index_UserStats_5_309576141__K1_2] ON [dbo].[UserStats]
(
[MailboxGuid] ASC
)
INCLUDE ( [Date]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
go

CREATE STATISTICS [_dta_stat_373576369_1] ON [dbo].[Users]([objectGuid])
go

Query now takes about 2 seconds to return all the results, will keep an eye on this as the userstats table grows in size might need to consider archiving the data each day into another table instead of adding to the main table.
 
Last edited:
The results grid says it took 00:00:02 to execute the query and returned all 59838 records (number has gone up a little due to new users being added). I'm not sure if this is considered the execution time or the fetch time?

Is this bad or good? SQL server is a VM with 4GB of RAM and 2 CPU's.

Server doesn't seem to be under load, it literally just has these 2 tables running at the moment in the database and i'm just using this for testing at the minute. CPU usage floating around 0-1% and physical memory usage around 73%. Most of the memory usage seems to be used by sqlservr.exe process which is using just under 2GB of RAM, though I would not say it was out of the ordinary for SQL to use as much memory as it can really.
 
Back
Top Bottom