Total Pageviews

Tuesday, April 9, 2013

Non-clustered indexes per table – how many indexes are too many?


There is a strong recommendation that “Don’t create too many indexes, because it degrades performance due to overhead of DML operations”. Having indexes on a table is necessary and on the other hand it may have a tradeoff – performance gain versus overhead of maintaining. All indexes need optimization (rebuild, reorganize or statistics update) on a regular basis. So if the overall performance gain is worth more than the cost of maintaining indexes we receive from the queries that uses the index, then having indexes is fully justified.

MSDN Says “As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, DELETE and MERGE) operations will slow down significantly because more indexes must be maintained with each operation. Therefore, if queries are mostly SELECT statements, more indexes can be helpful. If the application performs many DML operations, we should be conservative with the number of indexes we create.”

Non-clustered indexes per table:
Starting from SQL Server 2008, a table can have 999 non-clustered indexes where as in SQL Server 2005, a table can have 249 non-clustered indexes. So from SQL 2008 there is a significant enhancement done in Database Engine level. In SQL 2012, one more new feature has been added– columnstore index, though it is not useful in OLTP implementation (sorry folks columnstore index is read only).

Index column limitation:
(a)    An index key (length of all index columns) can’t exceed 900 bytes and there will be a maximum of 16 columns in an index key.
(b)   From SQL Server 2005, we can include non-key columns in a non-clustered index to avoid the limitation of a maximum of 16 key columns and 900 bytes limitation.
(c)    If the table contains one or more XML indexes, the clustering key of a table is limited to 15 columns because the XML column is added to the clustering key of the primary XML index.
(d)   LOB data type (image/Text) can’t be part of an index key.

Good practices while creating indexes:
Creating useful indexes is one of the most important ways to achieve better query performance. Thoughtful indexes help find data with fewer disk I/O operations, less CPU cycles, small memory footprint and less system resource usage. The following are a few good practices around indexes:

(a)    Create a separate file group for indexes and place the file in a faster disk such as RAID10.
(b)   Based on DML operation, adjust Fill Factor from 80 to 90 for example. Low Fill Factor value introduces memory pressure, extensive I/O operation while High Fill Factor value causes page split thus high fragmentation.
(c)    Multi-column indexes are very useful; it creates multi-column statistics, but watch for “Index Depth”, more than 3 of this value decreases search efficiency.
(d)   Never add the clustered key with a non-clustered index or with the INCLUDE option.
(e)   While creating multi-column indexes, keep most unique value columns (high selective) to the left and try not to create a wider column index.
(f)     Be very careful about duplicate indexes and drop it without any hesitation.
(g)    Overlapping indexes - try to consolidate multiple indexes into few such as one or two
(h)   Review index usage on a regular basis and adjust (or drop) non-optimal indexes.

Index Rebuild/Reorganize know-how: 
1.       Always use DROP_EXISING=ON while rebuilding/adjusting an existing index.
2.       Examine fragmentation overtime; and
If the “avg_fragmentation_in_percent” value derived from “sys.dm_db_index_physical_stats” then rebuild/reorganize as follows:
(a) 5% and < = 30% - ALTER INDEX REORGANIZE
(b) > 30% - ALTER INDEX REBUILD
3.       Indexes with less than128 extents (1,024 pages =8mb) may not be good candidates for index rebuild.
4.       Clustered index cannot rebuild ONLINE when the underlying table contains LOB data type.
5.       A non-clustered index can rebuild ONLINE if it does not contain columns of LOB data type even though the underlying table does.
6.       Reorganizing (REORGANIZE) an index is always executed online and lesser resources are used.
7.       Reorganize does not update statistics but only compacts the index pages.
8.       Rebuilding a unique or non-unique clustered index does not rebuild the non-clustered indexes.
9.       When ALL is specified, all indexes on the table are dropped and rebuilt in a single transaction.

Non-clustered indexes per Table impact:
In the following test we will be reviewing the impact of having a number of indexes in a table. We will be performing this test:

1.       On a Heap without any index.
2.       On a clustered index with couple of indexes.

We will use the following work flow to perform a simple test:

1.       We will run SQL Profiler Trace.
2.       Create a table (which is still a heap).
3.       Insert 10,000 records.
4.       DROP the table and recreate it with clustered index, create five non-clustered indexes.
5.       Compare the profiler and DMV output about duration, page split and fragmentation.

The above test can also be performed using Microsoft “OSTress.exe” by creating multiple concurrent connections by executing smaller DML statements to simulate production like behavior. Either way, the final results will be same.

DMV Query:

To check table and index level changes:
SELECT  OBJECT_NAME(ios.object_id, ios.database_id) as table_name,
              ios.index_id ,
        si.name AS index_name,
        ios.leaf_insert_count +
        ios.leaf_update_count +
        ios.leaf_delete_count AS leaf_changes,
        ios.leaf_allocation_count AS leaf_page_splits,
        ios.nonleaf_insert_count +
        ios.nonleaf_update_count +
        ios.nonleaf_delete_count AS nonleaf_changes,
        ios.nonleaf_allocation_count AS nonleaf_page_splits,
        (ios.range_scan_count + ios.leaf_insert_count
            + ios.leaf_delete_count + ios.leaf_update_count
            + ios.leaf_page_merge_count + ios.singleton_lookup_count
           ) total_changes
FROM    sys.dm_db_index_operational_stats(DB_ID(), NULL, NULL, NULL) ios
        JOIN sys.objects so ON so.object_id = ios.object_id
        JOIN sys.indexes si ON si.object_id = ios.object_id
                               AND si.index_id = ios.index_id
        JOIN sys.schemas ss ON so.schema_id = ss.schema_id
WHERE   OBJECTPROPERTY(ios.object_id, 'IsUserTable') = 1
ORDER BY leaf_changes DESC

To check index fragmentation:
SELECT  a.index_id ,
        b.name AS [object_name],
        CONVERT(NUMERIC(5,2),a.avg_fragmentation_in_percent) pct_avg_fragmentation
FROM    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS a
        JOIN sys.indexes AS b ON a.object_id = b.object_id
                                 AND a.index_id = b.index_id;


Let’s get started:

1.       Create the database and table:

CREATE DATABASE TestDB
GO
USE [TestDB]
GO

CREATE TABLE [tblLarge](
       [xID] [int] IDENTITY(1,1) NOT NULL,
       [sName1] [varchar](10) NULL,
       [sName2] [varchar](13) NULL,
       [sName3] [varchar](36) NULL,
       [sIdentifier] [char](2) NULL,
       [dDOB] [date] NULL,
       [nWage] [numeric](12, 2) NULL,
       [sLicense] [char](7) NULL,
       [bGender] [bit] NULL
) ON [PRIMARY]
GO


2.       Start the SQL Profiler Trace and we can filter the trace for the SPID which we are using to execute query.
3.       Run the following INSERT statement in a loop to insert 10,000 records.

DECLARE @n INT = 0
BEGIN TRAN
WHILE @n <= 10000
      BEGIN
            INSERT  INTO tblLarge
                    ( sName1 ,
                      sName2 ,
                      sName3 ,
                      sIdentifier ,
                      dDOB ,
                      nWage ,
                      sLicense ,
                      bGender
                    )
            VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), 8) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 13) ,
                      CAST(NEWID() AS VARCHAR(36)) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,
                      DATEADD(dd, -RAND() * 20000, GETDATE()) ,
                      ( RAND() * 1000 ) ,
                      SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,
                      COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)),1)),0)
                    )
            SET @n = @n + 1
      END
COMMIT TRAN

4.       Drop the existing table. Then create the same table, along with a clustered index and five non-clustered indexes.

ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)

CREATE NONCLUSTERED INDEX IX_tblLarge_sIdentifier ON tblLarge (sIdentifier)
INCLUDE (sName1, sName2 ) WITH (FILLFACTOR=90)

CREATE NONCLUSTERED INDEX IX_tblLarge_sName1 ON tblLarge (sName1)
INCLUDE (sIdentifier, sName2) WITH (FILLFACTOR=90)

CREATE NONCLUSTERED INDEX IX_tblLarge_sName2 ON tblLarge (sName2)
INCLUDE (dDOB, nWage ) WITH (FILLFACTOR=90)

CREATE NONCLUSTERED INDEX IX_tblLarge_sName3 ON tblLarge (sName3)
INCLUDE (sIdentifier, nWage, sLicense) WITH (FILLFACTOR=90)

CREATE NONCLUSTERED INDEX IX_tblLarge_dDOB ON tblLarge (dDOB)
INCLUDE (sIdentifier, sName1, sName2, sName3 ) WITH (FILLFACTOR=90)

5.       Run the INSERT statement again and collect DMV and SQL Profiler output. Below are all the captured outputs:

Figure #1: Inserting 10000 records in a Heap
  
Figure #2: DMV output when inserting 10,000 records in a clustered table and when there are five non-clustered indexes
 
Figure #3: SQL Profiler traces output


Points from test:
From our above simple test we may conclude the following:
1.       When a table is HEAP, INSERT is fast.
2.       The INSERT statement is much more expensive when having indexes in terms of CPU, Duration, Read and Writes.
3.       When we have clustered and non-clustered indexes, due to INSERT, page splits occurs in leaf and non-leaf level.
4.       Fragmentation on all non-clustered indexes increases rapidly. However, fragmentations on clustered indexes are low.

Summary:
As we saw from our test that having indexes does have operational costs, but it does not make sense to stop creating useful indexes. If indexes are necessary then we should create it judiciously while ensuring all the best practices.   

How many indexes are too many?  As many as required.

References:
Learn from Guru: Paul S. Randal

Expert Performance Indexing for SQL Server 2012 (aprèss book)
Jason Strate and Ted Krueger

Specify Fill Factor for an Index

ALTER INDEX (Transact-SQL)


No comments:

Post a Comment