SQL Server - Indexes

Goksly · 26 Jun 2010 at 20:02

Can someone confirm a few things for me:

1) Each index always contains the primary key information (so if the table is a linker table, I don't need to 'include' the primary key column)?

2) If my where clause includes three columns and I have 3 indexes (one per column) will SQL Server (where it thinks appropriate) use those three indexes to find primary keys that appear in all three and then go a look up to find the full columns? Or, will it just use one index that it thinks will be most helpful and then does a lookup?

We've got a large table (ledger) which has a number of foreign keys. The data in this table, even though it's primary concern is quantity in a location, is viewed in many ways (and filtered in many ways). At the moment we have 10 indexes, some on individual columns, some composite indexes and a mixture with includes or non includes. Just wondering if the individual indexes are pointless (99% of queries will have 2 or more columns in the where clause).

Cheers!

Pho · 26 Jun 2010 at 22:48

Try looking at the "actual execution plan" in management studio. It'll give you a good idea as to what indexes you need to create (based on which parts of the query take the longest), and will show you which indexes are being used for various parts of the queries too. I.e. if you see a 'table scan' in there it suggests you're missing a key/index and by adding one you should speed up that section of the query.

I came across this the other day to help you find missing indexes (skip to the code at the very bottom). You can of course use SQL Server's Profiler tool to analyse a work load from your database (i.e. to analyse all the queries executed over the course of say a day) for you and suggest which indexes it thinks you're missing.

I'm fairly sure you need to create an index containing all the columns you want, i.e. for this query:

SELECT ID, Name from Users WHERE Company = 7

You would need to create a single index containing columns ID, Name and Company.

I'm not an expert with them and tend to hope for the best and test the results with the execution plan so I could be completely wrong

.

Andre · 27 Jun 2010 at 06:23

Which version of SQLServer is it?
Later versions will auto-compute required columns when left to it
Do you use Stored Procedures, or is the SQL calculated in the program and then passed through (Stored Procedures are good - gives SQLServer a chance to optimise things!)

Andre · 27 Jun 2010 at 06:25

Pho said:
I'm fairly sure you need to create an index containing all the columns you want, i.e. for this query:

SELECT ID, Name from Users WHERE Company = 7

You would need to create a single index containing columns ID, Name and Company.

erm - no. Just an index on Company would do
But if you did a where clause on Company and Users, then it would be worth adding an index of Company and Users.

Pho · 27 Jun 2010 at 15:41

I loaded a table with 1,000,000 testing rows:

Using an index just on company (and the primary key one):

Using an index on Company, including ID and Name (CREATE NONCLUSTERED INDEX ix_Company ON [dbo].[Users] ([Company]) INCLUDE ([ID],[Name]) :

Makes a difference!

Goksly · 27 Jun 2010 at 16:35

What if you don't include ID? Is it still just an index seek (assuming ID is primary key)?

We used stored procs for complex reports such as the one I'm having issues with. The problem is the execution plan does not make sense. It says one part of the stored proc is taking 5% of the time, but when I comment it out the query runs in 1 second instead of 7. Also when I run that query by itself it's fast but uses different indexes than when the query is run as part of a larger stored proc (as I reference the data in several different ways, I'm loading the data into a table variable).

We are running 2005, moving to 2008 shortly.

Pho · 27 Jun 2010 at 17:02

If I remove ID from the included columns for the index it also still uses a seek (I assume it adds the ID automatically to the index as it's the primary key). If I remove name from the included columns it essentially falls back to just an index on company and goes back to parallelism (assuming you have it enabled).

Could you post up your procedure and show the execution plan it's using? If you could do a DB dump of some sample data+the proc that would be really handy but I image it'll be a pain if it's complex.

Goksly · 27 Jun 2010 at 22:39

Can't really do all that I'm afraid - company data (and a lot of it - 1m+ rows). What you've done (thank you!) is answered the first of my questions...

What I can do (tomorrow) is show you the query in question and the 2 different execution plans... the goal is to work out if sql server can use multiple indexes to work out the target rows or whether it just has to pick the best one for the job and then work through all the data (forcing me to make the 'one' made up of multiple rows to ensure a small result set).

Thanks again.

Pho · 27 Jun 2010 at 22:48

Sounds good. I'll be waiting

.

Failing that, if it's not too sensitive you could post the structure of the table (which you can generate from management studio with no data)/the SP and I can fill it with random data myself.

Redgate's Data Generater is great.

Goksly · 28 Jun 2010 at 11:22

All sorted! And it's very bizzare....

Basically I had a table which I was referring to a lot (to cut down the result sets of other queries). To make this faster I thought 'lets load it into a table variable and then just do 'distinct' on the various columns when I need it'. The table is only 1800 rows, so what harm can it do? Well apparently, a lot.

Query was taking 20 seconds to run. By taking the table variable and creating 3 more table variables (containing the distinct values) the query now takes 3 seconds to run. And now the excution plan looks a lot more sensible (cost high where I expect it to be).

The only clue I had to make the changes I did is one icon on the execution plan, which has a distinct on the client id (expecting to get 3 rows) returned 1.3m rows. The cost for this was only 1%, the only reason I looked at it was due to the arrow coming off it being large (indicating large results etc).

Strange. Very strange! Will be thinking twice when setting up table variables in future.

Pho · 28 Jun 2010 at 11:34

That does sound pretty odd, glad that it's all sorted then

. Did you add a primary key to the table variable? That might slow things.

You aren't using cursors or anything are you in the code either are you? They tend to slow things down loads.

I'm trying to do a similar thing at work at the moment; our system involves lots of large calculations with hundreds of thousands of rows and trying to optimise them is somewhat challenging. Mind you, I think our biggest problem is when the results are sent to .NET and then transformed into another data representation (I didn't originally write this system

) which has probably outgrown its capacity.

Goksly · 28 Jun 2010 at 13:15

I did have primary keys but when removing them everything was the same (cost to insert primary key was same cost to insert into table etc). Also no cursors... just 3 tables variables (making them inline slowed the queries down again?) that are then all joined together to provide the result set. I really thought that doing a distinct would help reduce the work load.... learn something new every day