Total Pageviews

Sunday, September 22, 2013

NUMA - Memory and MAXDOP settings

It is common practice to put more hardware power to alleviate the application performance issues rather than fixing the issues. For short life and non-mission critical applications, it makes sense. But for the mission critical applications that benefit from upgraded hardware, it does not last long. Fixing application issues become more obvious and important. At the same time, tweaking the server configuration is also necessary to operate the Database server flawlessly.

Modern hardware, such as NUMA based server technology, has a tremendous capability to process application requests faster than SMP architecture. Microsoft SQL Server is fully capable of using the NUMA architecture and taking advantage of it. Starting from SQL 2000 SP4, Microsoft supports Hardware NUMA and in each release, support for the NUMA architecture is getting enhanced.

NUMA and Memory Setting:
In a NUMA based system, memory setting (min server memory and max server memory) plays an important role. It is generally the best practice to configure memory in such a way that allocated memory is distributed evenly across all NUMA nodes. This will help each NUMA node to operate independently without demanding memory from other nodes. Accessing memory on another NUMA node is called “remote memory access” and accessing memory on the same NUMA node is called “local memory access”. Accessing different node for memory introduces latency.

To get best out of the NUMA system, the following settings are highly recommended:

1.      Lock Pages in Memory: The SQL Server Service account needs to have “Lock Pages in Memory” in place in the Windows local security policy. This will prevent paging out SQL Server memory back to Windows.

2.      Max and Min Server Memory: Max and Min server memory will need to be equal for two reasons:

(a)   This will reduce overhead (allocation and de-allocation) that would otherwise be used by SQL Server dynamically managing these values.
(b)   As Memory calculation is usually derived from “Max Server Memory”, SQL Server Engine will have better values to allocate physical memory evenly to each NUMA node. This will reduce “Foreign Memory” requirement that occurs during data processing on one node.

3.      MAXDOP: For servers that have NUMA configured, MAXDOP should not exceed the number of CPUs that are assigned to each NUMA node. Meaning that if a server has 4 NUMA nodes and each NUMA node consists with 4 CPUs then MAXDOP will be 4 or less. This will reduce threading overhead that occurs which will then be utilizing more NUMA nodes simultaneously.

Memory Allocation in each NUMA node:
To learn how much memory each node has received, PerfMon or sys.dm_os_perfromance_counters can be used. Following is a buffer allocation from an 8 node NUMA system.

DMV Query:
select  counter_name ,
        cntr_value * 8 / 1024 node_memory_mb ,
        instance_name
from    sys.dm_os_performance_counters
where   [object_name] like '%Buffer Node%'
        and counter_name like 'Total Pages%'
order by instance_name
compute sum(cntr_value * 8 / 1024 )


select  counter_name ,
        cntr_value * 8 / 1024  total_buffer_mb ,
        instance_name
from    sys.dm_os_performance_counters
where   [object_name] like '%Buffer Manager%'
        and counter_name like 'Total Pages%'
order by instance_name




CPU Usages in each NUMA Node:


Some disadvantages:
Although NUMA architecture is increasing processing power, there are some usage patterns which introduce some Latch contention in 32+ cores. In that case, database schema design such as index needs to be reviewed. A detailed guideline can be found in Microsoft’s technical document paper: “Diagnosing and Resolving Latch Contention on SQL Server”.

If overtime “Foreign Pages” counter is high for one or more nodes, this usually means that the nodes require more memory to perform a particular workload. While adding more memory might help, it is recommended to see if the query can be optimized along with index changes.

Read More:
SQL Server and Large Pages Explained

Recommendations and guidelines for the "max degree of parallelism" configuration option in SQL Server

SQL Server, Buffer Node Object

Diagnosing and Resolving Latch Contention on SQL Server

How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

Tuesday, September 3, 2013

Choosing an efficient clustered key - Other side of the moon

Other side of the moon
Choosing an efficient clustered key is a crucial factor of good database performance. However this factor is often neglected during database schema design time leading to poor performance. It also becomes difficult to resolve issues when the database grows into multi-terabytes in size and even using huge hardware will not reach the satisfactory performance goal.

What is a Clustered Key?
“Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order”. To improve performance, reducing the IO overhead is necessary to have a clustered key on almost all tables in a high transactional database. Although there are numerous guidelines and good practices that are available to understand the necessity of an appropriate clustered key in a table; the question is how often do we consider those practices while choosing a clustered key?
 
It is generally advisable that a clustered key should be the one which is narrow in length and has a unique value column (such as primary key). If the column is not unique then Database Engine will add a 4-byte uniqueifier value to each row to make the column unique. This added value is internal and can’t be seen or accessed by the user and has some internal overhead. However, the more inefficiency occurs when the clustered key is wider than needed.

Pitfall of in-efficient clustered key:
1.      Fragmentation: Rapidly introduces more fragmentation.
2.      Page Split: A huge number of page allocations and de-allocations happen.
3.      Space: Requires more disk & memory, and IO cost will be high.
4.      CPU Usage: Observe high CPU due to excessive page split.
5.      Slowness: Query response time decreases.
6.      Optimization: Index optimization requires more time.

Good Clustered Key:
1.      A unique key column is the best candidate for a clustered key.
2.      IDENTITY column is a good choice as they are sequential.
3.      The column which is used on a JOIN clause.
4.      The column used to retrieve data sequentially.
5.      The column used in SORT (GROUP or ORDER) operation frequently.
6.      Frequently used in range scan (such as BETWEEN, >=, =< )
7.      Static Column: such as EmployeeID, SSN.

In-efficient choice for clustered key:
1.      Wide Keys: multi-columns key. Such as LastName + FirstName + MiddleName or Address1 + Adddress2 + Phone, so on. “The key values from the clustered index are used by all non-clustered indexes as lookup keys. Any non-clustered indexes defined on the same table will be significantly larger because the non-clustered index entries contain the clustering key and also the key columns defined for that non-clustered index.
2.      GUID: Randomly generated unique values leads to highest possible fragmentation. NEWSEQUENTIALID() can be used  instead of NEWID() to create GUID to reduce fragmentation in a table.
3.      Data Changes: The column which has frequent value change is not a good choice for a clustered key.

Narrow vs. Wide Clustered Key Test:
Here we will be observing how a wide clustered key introduces performance issues. In our example,
(a)   “xID” is the Clustered Key which is a Primary Key and an Identity column.
(b)   Later we will create a multi-column clustered key by using “sName1”, “sName2” and “sName3” which are varchar columns.
(c)    We will insert 100,000 rows for this test
(d)   We will review fragmentation and page split for type of indexes.

DMV Query:
--To check table and index level changes:
SELECT  OBJECT_NAME(ios.object_id, ios.database_id) as table_name,
              ios.index_id ,
        si.name AS index_name,
        ios.leaf_insert_count +
        ios.leaf_update_count +
        ios.leaf_delete_count AS leaf_changes,
        ios.leaf_allocation_count AS leaf_page_splits,
        ios.nonleaf_insert_count +
        ios.nonleaf_update_count +
        ios.nonleaf_delete_count AS nonleaf_changes,
        ios.nonleaf_allocation_count AS nonleaf_page_splits,
        (ios.range_scan_count + ios.leaf_insert_count
            + ios.leaf_delete_count + ios.leaf_update_count
            + ios.leaf_page_merge_count + ios.singleton_lookup_count
           ) total_changes
FROM    sys.dm_db_index_operational_stats(DB_ID(), NULL, NULL, NULL) ios
        JOIN sys.objects so ON so.object_id = ios.object_id
        JOIN sys.indexes si ON si.object_id = ios.object_id
                               AND si.index_id = ios.index_id
        JOIN sys.schemas ss ON so.schema_id = ss.schema_id
WHERE   OBJECTPROPERTY(ios.object_id, 'IsUserTable') = 1
ORDER BY leaf_changes DESC

--To check index fragmentation:
SELECT  a.index_id ,
        b.name AS [object_name],
        CONVERT(NUMERIC(5,2),a.avg_fragmentation_in_percent) pct_avg_fragmentation
FROM    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) AS a
        JOIN sys.indexes AS b ON a.object_id = b.object_id
                                 AND a.index_id = b.index_id;


Script to test:
CREATE DATABASE TestDB
GO
USE [TestDB]
GO

CREATE TABLE [tblLarge](
       [xID] [int] IDENTITY(1,1) NOT NULL,
       [sName1] [varchar](10) DEFAULT 'ABC' NOT NULL,
       [sName2] [varchar](13) DEFAULT 'ABC' NOT NULL,
       [sName3] [varchar](36) DEFAULT 'ABC'NOT NULL,
       [sIdentifier] [char](2) NULL,
       [dDOB] [date] NULL,
       [nWage] [numeric](12, 2) NULL,
       [sLicense] [char](7) NULL,
       [bGender] [bit] NULL
) ON [PRIMARY]
GO


-- Clustered key on xID
ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (xID) WITH (FILLFACTOR=90)
GO

-- DROP constraint
ALTER TABLE [dbo].[tblLarge] DROP CONSTRAINT [PK_tblLarge]
GO

-- Multi-column clustered key
ALTER TABLE tblLarge ADD CONSTRAINT PK_tblLarge
PRIMARY KEY CLUSTERED (sName1, sName2, sName3) WITH (FILLFACTOR=90)
GO

-- Insert 100,000 records
            INSERT  INTO tblLarge
                    ( sName1 ,
                      sName2 ,
                      sName3 ,
                      sIdentifier ,
                      dDOB ,
                      nWage ,
                      sLicense ,
                      bGender
                    )
            VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), 8) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 13) ,
                      CAST(NEWID() AS VARCHAR(36)) ,
                      LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,
                      DATEADD(dd, -RAND() * 20000, GETDATE()) ,
                      ( RAND() * 1000 ) ,
                      SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7) ,
                      COALESCE(ISNUMERIC(LEFT(CAST(NEWID() AS VARCHAR(36)),1)),0))
GO 100000


Fragmentation Comparison:
As you can see from the following picture given below that the fragmentation and page split has been increased dramatically when wide key has been used.

Figure#1: Narrow clustered key


Figure#2: Multi-column clustered key







Conclusion:
While using wide key or multi-columns for clustered index is supported by SQL Server, but we should not overlook the dangerous performance consequences that occurs silently. 


Reference:
Clustered Index Design Guidelines