Search
 

Technical Articles

Subscribe

Browse through the latest and greatest technical articles on differing SQL Server topics.

 

Current Articles | Categories | Search

433 Views | 1 Comments

A Small Discussion on Covering Indexes

By Randy Dyess

While discussing indexes during my classes or during presentations, I am often asked what a covering index is and why should you use one. This article will attempt to answer those questions by explaining the benefits of covering indexes in both SQL Server 2000 and SQL Server 2005. Note: The indexes shown in the examples in this article are not intended to be production ready indexes. I did not look at the cardinality of the columns and the usage of the columns to determine the exact ordering of the columns, nor did I look at existing indexes to see if those indexes could be modified. These examples are used only to show the points of the article.

What is a covering index?

A covering index is a nonclustered index that contains all the columns needed to satisfy a query. This means that columns included in the SELECT clause, the WHERE clause, any JOIN clauses, and columns used in ORDER BY, GROUPING, HAVING, etc., all need to be part of the index key, or, if you are dealing with SQL Server 2005 the columns need to be either in the index key or included with the index. When dealing with multiple tables in larger queries, covering indexes will need to be built to cover the columns referenced from each table. If you look at the small example query below to determine which columns need to be in a covering index, you can see that the query references the Name, GroupName, and Modified Date columns. In order to create a covering index for this query, you will need to include all these columns as part of the index key or using the new ability of SQL Server 2005 to include columns. SELECT [Name], GroupName FROM HumanResources.Department WHERE ModifiedDate < GETDATE() The script for creating this covering index in SQL Server 2000 or SQL Server 2005 by including all columns in the index key would be something like this: CREATE NONCLUSTERED INDEX ncl_Department_covering ON HumanResources.Department(ModifiedDate,[Name],GroupName); Alternatively, when you are working with SQL Server 2005 you can create the index on one or more columns and include the rest of the columns needed for the covering index. CREATE NONCLUSTERED INDEX ncl_Department_covering ON HumanResources.Department(ModifiedDate) INCLUDE ([Name],GroupName); If you dig into the makeup and performance of each index above you will often find out that including columns outside of the index key saves not only on space. As the included columns are not stored on every level of the B-Tree, included columns improve overall system performance by reducing the amount of disk I/O needed to access the index data and by reducing the amount of I/O needed to maintain the index. As you can tell, creating a covering index for a simple query as the one shown in the example would be very easy to accomplish. What happens as you move into the real world and your queries become more and more complex? Let us say that you wanted to return all the information on each employee that is found in the AdventureWorks database. The resulting query would look similar to the one below:

SELECT
e.Title AS ‘Employment Title’
,c.Title AS ‘Title’
,c.FirstName AS ‘First Name’
,c.MiddleName AS ‘Middle Name’
,c.LastName AS ‘Last Name’
,c.Suffix AS ‘Suffix’
,c.EmailAddress AS ‘Email Address’
,c.Phone AS ‘Phone Number’
,a.AddressLine1 AS ‘Address Line 1’
,a.AddressLine2 AS ‘Address Line 1’
,a.PostalCode AS ‘Postal Code’
,sp.StateProvinceCode AS ‘State’
,cr.Name AS ‘Country’
,e.BirthDate AS ‘Employee Birthdate’
,e.MaritalStatus AS ‘Employee Marital Status’
,e.Gender AS ‘Employee Gender’
,e.HireDate AS ‘Employee Hiredate’
,e.VacationHours AS ‘Number of Vacation Hours Left’
,e.SickLeaveHours AS ‘Number of Sick Leave Hours Left’
,c1.Title AS ‘Manager’’s Title’
,c1.FirstName AS ‘Manager’’s First Name’
,c1.LastName AS ‘Manager’’s Last Name’
,c1.EmailAddress AS ‘Manager’’s Email Address’
,c1.Phone AS ‘Manager’’s Phone Number’
,ph.Rate AS ‘Current Pay Rate’
,d.[Name] AS ‘Current Department Name’
,d.GroupName AS ‘Current Department Group’
,edh.StartDate AS ‘Starting Date with Current Department’
FROM HumanResources.Employee e
INNER JOIN Person.Contact c
ON e.ContactID = c.ContactID
INNER JOIN HumanResources.EmployeeAddress ea
ON e.EmployeeID = ea.EmployeeID
INNER JOIN Person.Address a
ON ea.AddressID = a.AddressID
INNER JOIN Person.StateProvince sp
ON a.StateProvinceID = sp.StateProvinceID
INNER JOIN Person.CountryRegion cr
ON sp.CountryRegionCode = cr.CountryRegionCode
INNER JOIN HumanResources.EmployeePayHistory ph
ON e.EmployeeID = ph.EmployeeID
INNER JOIN HumanResources.EmployeeDepartmentHistory edh
ON e.EmployeeID = edh.EmployeeID
INNER JOIN HumanResources.Department d
ON edh.DepartmentID = d.DepartmentID
INNER JOIN HumanResources.Employee e1
ON e.ManagerID = e1.ManagerID
INNER JOIN Person.Contact c1
ON e1.ContactID = c1.ContactID
ORDER BY e.EmployeeID

As you can see, this is a little more complex than the first example. To try to build a covering index on the HumanResources.Employee table would force you to include all of the following columns: Title, BirthDate, MaritalStatus, Gender, HireDate, VacationHours, SickLeaveHours, EmployeeID, ManagerID, and ContactID. CREATE NONCLUSTERED INDEX ncl_Employee_covering ON HumanResources.Employee(Title, BirthDate, MaritalStatus, Gender, HireDate, VacationHours, SickLeaveHours, EmployeeID, ManagerID, ContactID) Or CREATE NONCLUSTERED INDEX ncl_Employee_covering ON HumanResources.Employee(Title, BirthDate) INCLUDE (MaritalStatus, Gender, HireDate, VacationHours, SickLeaveHours, EmployeeID, ManagerID, ContactID) Plus you would have to build covering indexes on the Person.Address, Person.CountryRegion, Person.StateProvince, HumanResources.EmployeePayHistory, HumanResources.EmployeeDepartmentHistory, HumanResources.Department, and Person.Contact tables as well. CREATE NONCLUSTERED INDEX ncl_Contact_covering ON Person.Contact(Title,FirstName,MiddleName,LastName,Suffix,EmailAddress ,Phone,ContactID) CREATE NONCLUSTERED INDEX ncl_EmployeeAddress_covering ON HumanResources.EmployeeAddress(EmployeeID,AddressID) CREATE NONCLUSTERED INDEX ncl_Address_covering ON Person.Address(AddressID,StateProvinceID,AddressLine1,AddressLine2,Post alCode) CREATE NONCLUSTERED INDEX ncl_StateProvince_covering ON Person.StateProvince(StateProvinceID,StateProvinceCode,CountryRegionCo de) CREATE NONCLUSTERED INDEX ncl_CountryRegion_covering ON Person.CountryRegion(Name,CountryRegionCode) CREATE NONCLUSTERED INDEX ncl_EmployeePayHistory_covering ON HumanResources.EmployeePayHistory(Rate,EmployeeID) CREATE NONCLUSTERED INDEX ncl_EmployeeDepartmentHistory_covering ON HumanResources.EmployeeDepartmentHistory(EmployeeID,StartDate,Dep artmentID) CREATE NONCLUSTERED INDEX ncl_Department_covering ON HumanResources.Department([Name],GroupName,DepartmentID)

Note: Before anyone asks, creating all these indexes as they are written above does nothing to improve the performance of the example query. This is because the optimizer is able to use the clustered indexes already on the tables to execute the query in an optimal manner.

What are the drawbacks of trying to build covering indexes?

I skirted the issue earlier in this article about the main drawbacks of covering indexes. Covering indexes are hard to create for large complex queries as the number of columns that need to be covered often require an index that is too large (index keys are restricted to 900 bytes), decrease overall system performance during index maintenance (both index maintenance performed by the database administrator and automatic maintenance performed by SQL Server itself during INSERT, UPDATE, and DELETE statements), and increase the size of the index. This requires more space for the database and reduces the available amount of memory for queries by requiring that more index pages be read into memory during the indexes’ access. Sounds like a lot of issues, that when added together, make covering indexes not worth the trouble. However, covering indexes are very useful in solving one of the biggest reasons queries perform poorly: bookmark lookups. Note: If you are not familiar with the concept of bookmark lookups, you can review what they are in one of my earlier articles on SQLServerCentral.Com: http://www.sqlservercentral.com/columnists/RDyess/bookmarklookups. asp Yes, the new feature as found in SQL Server 2005 of including additional columns in indexes is vastly more useful than trying to create true covering indexes, but when you add the additional columns to an index you are basically creating a covering index, and so I treat the two ways the same. On a personal note I would much rather create an index with included columns than an index with a large amount of index keys, due to the reduced storage space since included columns are not stored at all levels of the B-Tree, only the leaf level.

How does a covering index help?

So why go through the trouble of creating covering indexes if they increase the space needed for the index, require system resources to maintain, and are basically hard to create against complex queries? The reduction in disk I/O that is gained by eliminating bookmark lookups is worth investigating covering indexes. To show this let us take a small query that causes bookmark lookups. This query will create a table with 100,000 rows of data and will then add a typical clustered index to the table. Once the table is ready, the query will return a single row of data from the table so we can look at the execution plan of the query. Note: SQL Server 2005 has removed the bookmark lookup operator and now either executes a nested loop if the query goes against a clustered index or a RID (row identifier) lookup if the table is a heap. Personally, I think this is what had always happened. Microsoft is just now exposing what went on behind the scenes.

USE AdventureWorks;
—Create large table for example
IF OBJECT_ID(‘tTestBookmark’) IS NOT NULL
DROP TABLE tTestBookmark
GO
CREATE TABLE tTestBookmark
(
intID INT
,intCol1 INT
,strCol1 VARCHAR(10)
);
—Loop through and add 100,000 rows
DECLARE @intLoop INTEGER
SET @intLoop = 1
WHILE @intLoop < 100001
BEGIN
INSERT INTO tTestBookmark VALUES (@intLoop,@intLoop,
@intLoop)
SET @intLoop = @intLoop + 1
END
CREATE CLUSTERED INDEX cl_tTestBookmark ON tTestBookmark(intID)
SET STATISTICS PROFILE ON;
SET STATISTICS IO ON;
SELECT strCol1
FROM tTestBookmark
WHERE intCol1 = 3

Looking at the execution plan for the query we notice that the query is performing a clustered index scan while creating 347 logical reads. SELECT [strCol1] FROM [tTestBookmark] WHERE [intCol1]=@1 |—Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[cl_tTestBookmark ]), WHERE:([AdventureWorks].[dbo].[tTestBookmark].[intCol1]=CONVERT_IMPL ICIT(int,[@1],0))) Table ‘tTestBookmark’. Scan count 1, logical reads 347, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob readahead reads 0. Typically, when developers or database administrators see a table scan or clustered index scan occurring the first thing they do is to try to create a nonclustered index. CREATE NONCLUSTERED INDEX ncl_tTestBookmark ON tTestBookmark(intCol1) Looking at the logical reads, you will notice the vast improvement made by this simple act of creating an index. Table ‘tTestBookmark’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob readahead reads 0. Looking at the execution plan, you notice that the clustered index scan is gone and in its place is the new form of bookmark lookup. SELECT [strCol1] FROM [tTestBookmark] WHERE [intCol1]=@1 |—Nested Loops(Inner Join, OUTER REFERENCES:([Uniq1002], [AdventureWorks].[dbo].[tTestBookmark].[intID])) |—Index Seek(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[ncl_tTestBookmar k]), SEEK:([AdventureWorks].[dbo].[tTestBookmark].[intCol1]=(3)) ORDERED FORWARD) |—Clustered Index Seek(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[cl_tTestBookmark] ), SEEK:([AdventureWorks].[dbo].[tTestBookmark].[intID]=[AdventureWorks].[d bo].[tTestBookmark].[intID] AND [Uniq1002]=[Uniq1002]) LOOKUP ORDERED FORWARD) If you notice, the execution plan now contains a Nested Loop when utilized and Index Seek on the [ncl_tTestBookmark] index and a Clustered Index Seek on the [cl_tTestBookmark] index. This is the new replacement for the Bookmark Lookup operator against a clustered index. To get rid of the new bookmark lookup mechanism, add the missing column to the index. You can find this column by looking at the execution plan line for the Clustered Index Seek under the DefinedValues column: [AdventureWorks].[dbo].[tTestBookmark].[strCol1] Drop the existing index and recreate it by including the second column: DROP INDEX tTestBookmark.ncl_tTestBookmark; CREATE NONCLUSTERED INDEX ncl_tTestBookmark ON tTestBookmark(intCol1) INCLUDE (strCol1) Now execute the query again: SELECT strCol1 FROM tTestBookmark WHERE intCol1 = 3 Looking at the new execution plan you will notice that the bookmark lookup mechanism has been replaced with a simple Index Seek: SELECT [strCol1] FROM [tTestBookmark] WHERE [intCol1]=@1 |—Index Seek(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[ncl_tTestBookmar k]), SEEK:([AdventureWorks].[dbo].[tTestBookmark].[intCol1]=CONVERT_IMPLIC IT(int,[@1],0)) ORDERED FORWARD) And we have reduced our number of logical pages from 4 to 2. Table ‘tTestBookmark’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob readahead reads 0. If you want to see an even more dramatic change, try rewriting the query into a range scan: SELECT strCol1 FROM tTestBookmark WHERE intCol1 BETWEEN 3000 AND 4000 When you execute the query against the one column nonclustered index you will find that the optimizer does not even use the index. Instead, it uses the clustered index: SELECT [strCol1] FROM [tTestBookmark] WHERE [intCol1]>=@1 AND [intCol1]<=@2 |—Clustered Index Scan(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[cl_tTestBookmark ]), WHERE:([AdventureWorks].[dbo].[tTestBookmark].[intCol1]>=(3000) AND [AdventureWorks].[dbo].[tTestBookmark].[intCol1]<=(4000))) Table ‘tTestBookmark’. Scan count 1, logical reads 347, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob readahead reads 0. The query against the two column nonclustered index performs as expected: SELECT [strCol1] FROM [tTestBookmark] WHERE [intCol1]>=@1 AND [intCol1]<=@2 |—Index Seek(OBJECT:([AdventureWorks].[dbo].[tTestBookmark].[ncl_tTestBookmar k]), SEEK:([AdventureWorks].[dbo].[tTestBookmark].[intCol1] >= CONVERT_IMPLICIT(int,[@1],0) AND [AdventureWorks].[dbo].[tTestBookmark].[intCol1] <= CONVERT_IMPLICIT(int,[@2],0)) ORDERED FORWARD) Table ‘tTestBookmark’. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob readahead reads 0. A difference of 341 logical reads in my case.

Summary

Covering queries may be difficult to achieve for many of us due to the complex nature of most of our real-world queries, but with the new INCLUDE option for index creation they are something every developer and database administrator needs to try to build. For those who are worried about the impact of wide index keys and want something to go by, I would stick to included columns as there is really no quick and fast rule for deciding on how many columns to have in an index. Over time you will learn the nature of your database environment and will know when you should create a composite index with three or four columns or should simply include the additional columns. However, whichever method you end up using (probably both), covering indexes can go a long way in stopping one of those “hidden” query performance killers: bookmark lookups.

Author Bio

Randy has a large variety of experiences dealing with SQL Server over the past several years and has worked with environments with Terabytes of data and environments that had over a 1000 databases with only a few megabytes of data in each database. Currently Randy is a mentor with Solid Quality Learning and the founder of Dyess Consulting Inc. He specializes in training and consulting with a focus on Transact-SQL development, SQL Server performance tuning, and database security. Randy is the author of TransactSQL Language Reference Guide and numerous magazine and newsletter articles pertaining to SQL Server security and optimization issues and has spoken at various international and national conferences. Randy is also the founder and principle author of www.TransactSQL.Com. Randy is a member of the Board of Directors for PASS and is the Director of Programs for the North Texas SQL Server Users Group.

Comments
By Rushabh Mehta -

Great article!

You must be logged in to post a comment. You can login here

Please login to view this content.

Register as a PASS member today for free.



Register
Forgot Password?

PASS Community Summit 2009