Total Pageviews

Friday, April 1, 2016

Compressing everything in a SQL Server Database

Data compression is one of the most appreciated features of SQL Server. However, many of us overlook the benefits of data compression in terms of storage and query performance. If applied correctly, we could have save up to 30% disk space and speed up the query by at least 10 times. Although, there will be a penalty of 2% to 5% of increased CPU usage while compressing or decompressing pages, the gain is far more significant than adding some negligible processor usage.

Data Compression Setting:
Data compression is a feature of Enterprise Edition only and there is no any specific or particular server or database wide setting that can be turned on to implement the feature either server wide or on a per database basis. I think that the folks at Microsoft should consider implementing database compression feature settings at a database level so that we can use this feature more effectively.

What to compress?
Data compression needs to be applied individually on a per table or index basis and it can be applied to a heap table, clustered index, non-clustered index, indexed view, columnstore table and an index. For example, a clustered index table might have four non-clustered indexes and one nonclustered columnstore index. Thus you can have “PAGE” level compression on the clustered index, “ROW” level compression on the nonclustered indexes, “PAGE” level compression on the nonclustered indexes and “COLUMNSTORE_ARCHIVE” on a nonclustered columnstore index.

Figure#1: Different data compression settings on a table:

Some Good Points from MSDN:
1.      Enabling compression can cause query plans to change because the data is stored using a different number of pages and number of rows per page.
2.      Compression does not affect backup and restore.
3.      Compression does not affect log shipping.
4.      Compression has some affect and consideration on replication.
5.      Data compression is incompatible with sparse columns.
6.      When data is exported, even in native format, the data is output in the uncompressed row format.
7.      When data is imported, if the target table has been enabled for compression, the data is converted by the storage engine into compressed row format.
8.      When a clustered index is dropped, the corresponding heap partitions retain their data compression setting unless the partitioning scheme is modified. If the partitioning scheme is changed, all partitions are rebuilt to an uncompressed state.
9.      Columnstore tables and indexes are always stored with columnstore compression. To reduce the size of columnstore data further, configuring an additional compression called archival compression can be used. It may slow down the query execution and introduce high resource utilization.
10.  Compression is not available for system tables.
11.  A clustered index or Heap table can have different types of data compression levels.
12.  Regular index settings/options can be used with clustered or nonclustered indexes.
13.  Clustered or nonclustered columnstore indexes have fewer index settings.

Script to check data compression level:
Following is a handy script to check what has already been compressed and what has not:

/************************************************************
 * Check data compression level
 ************************************************************/ 
SELECT SCHEMA_NAME(o.schema_id) + '.' + OBJECT_NAME(o.object_id) AS table_name,
       o.type_desc,
       i.name                   AS index_name,
       p.data_compression_desc  AS compression_type,
       i.type_desc              AS storage_type
FROM   sys.partitions p
       INNER JOIN sys.objects o
            ON  p.object_id = o.object_id
       JOIN sys.indexes i
            ON  p.object_id = i.object_id
            AND i.index_id = p.index_id
-- WHERE  p.data_compression > 0
--     AND OBJECT_NAME(o.object_id) = 'Address1'

You can also get the similar information by using my free tool.

Figure#2: Different data compression settings on a table:
 
Some T-SQL Script to compress data in a database:

(a)    Compress all HEAP in a database:

/************************************************************
 * Data compression on HEAP
 ************************************************************/
SELECT 'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('HEAP')

(b)   Compress all CLUSTERED index tables:

/************************************************************
 * Data compression on a CLUSTERED index table and underlying indexes
 ************************************************************/
SELECT 'ALTER INDEX ALL ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('CLUSTERED')
  
(c)    Compress all NONCLUSTERED indexes:

/************************************************************
 * Data compression on nonClustered index
 ************************************************************/
SELECT 'ALTER INDEX [' + i.[name] +']'+ ' ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('NONCLUSTERED')
  
(d)   Compress all CLUSTERED COLUMNSTORE tables:

/************************************************************
 * Data compression on Clustered columnstore index
 ************************************************************/
SELECT 'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (MAXDOP = 16, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('CLUSTERED COLUMNSTORE')

(e)   Compress all NONCLUSTERED COLUMNSTORE indexes:

/************************************************************
 * Data compression on nonClustered columnstore index
 ************************************************************/
SELECT 'ALTER INDEX [' + i.[name] +']'+ ' ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (MAXDOP = 16, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('NONCLUSTERED COLUMNSTORE')

(f)     Data compression with different index settings:

/****************************************************************
 * Data compression on all nonclustered indexes
 *****************************************************************/
DECLARE @table_name AS VARCHAR(256)
DECLARE @index_name AS VARCHAR(256)
DECLARE @schema_name AS VARCHAR(256)
DECLARE @type_desc AS VARCHAR(50)

DECLARE CurCompress CURSOR
FOR
    SELECT SCHEMA_NAME(o.[schema_id])  AS [schema_name],
           o.[name]                    AS [table_name],
           i.[name]                    AS [index_name],
           i.[type_desc]
    FROM   sys.objects o
           JOIN sys.indexes i
                ON  o.[object_id] = i.[object_id]
    WHERE  o.[type_desc] = 'USER_TABLE'
    ORDER BY
           o.[name]
      
OPEN CurCompress
FETCH NEXT FROM CurCompress INTO @schema_name, @table_name, @index_name, @type_desc
WHILE @@fetch_status = 0
BEGIN
    IF @type_desc = 'NONCLUSTERED'
    BEGIN
        PRINT '/****************************************************************************************'
        PRINT '* Data compression for the table: ' + @table_name
        PRINT '****************************************************************************************/'
        PRINT 'ALTER INDEX [' + @index_name + '] ON [' + @schema_name + '].[' + @table_name + '] '
        PRINT 'REBUILD WITH ('
        PRINT '        FILLFACTOR = 95,'
        PRINT '        SORT_IN_TEMPDB = OFF,'
        PRINT '        STATISTICS_NORECOMPUTE = OFF,'
        PRINT '        ONLINE = OFF,'
        PRINT '        ALLOW_ROW_LOCKS = ON,'
        PRINT '        ALLOW_PAGE_LOCKS = ON,'
        PRINT '        MAXDOP = 16,'
        PRINT '        DATA_COMPRESSION = PAGE'
        PRINT ')'
        PRINT 'GO'
        PRINT 'PRINT ''Compression completed on - ' + @table_name + ''''
        PRINT ' '
    END

    IF @type_desc = 'NONCLUSTERED COLUMNSTORE'
    BEGIN
        PRINT '/****************************************************************************************'
        PRINT '* Data compression for the table: ' + @table_name
        PRINT '****************************************************************************************/'
        PRINT 'ALTER INDEX [' + @index_name + '] ON [' + @schema_name + '].[' + @table_name + '] '
        PRINT 'REBUILD WITH ('
        PRINT '        MAXDOP = 16,'
        PRINT '        DATA_COMPRESSION = COLUMNSTORE_ARCHIVE'
        PRINT ')'
        PRINT 'GO'
        PRINT 'PRINT ''Compression completed on - ' + @table_name + ''''
        PRINT ' '
    END

   
    FETCH NEXT FROM CurCompress INTO @schema_name, @table_name, @index_name, @type_desc
END
CLOSE CurCompress
DEALLOCATE CurCompress

References:

Data Compression

Data Compression: Strategy, Capacity Planning and Best Practices

ALTER INDEX (Transact-SQL)

SQL Server Data Compression – Beauty or Beast?


No comments:

Post a Comment