Total Pageviews

Wednesday, February 20, 2013

Delimited string parsing – Hello T-SQL technique

In any database system, often string parsing is a necessity to satisfy certain application functionalities. In SQL Server, there are a number of built-in strings handling functions that are available to work with. However, staring with SQL 2005, the XML parsing mechanism has been enhanced and provides better handing in an intellectual and efficient way.

Below is a ‘|’ delimited string which we can use to start our research to split in a tabular format.

'sAuthorizedUser;Yes|dRequest;6/16/2012 1:37:08 PM|sSymbol;~!@#$%^&*(`)".\,|sHeader;Abnormal|sSortDirection;ASC|nSequence;3775301,3702915,13777470,5644123|sAction;Update|sType;Revisit|;Empty Variable|Empty Value;|;'

It is preferable to get the following output shown in the picture from the delimited string above.

There are a couple of traditional techniques around string parsing, which are usually searching the entire long string for a specific delimiter using a loop and by utilizing CHARINDEX, LEFT, LEN, SUBSTRING and so on and then putting the intermediate result in a table variable or in a temporary table.

Recently I have come up with a better way to parsing the delimited string using XML Parsers. The idea I have is from Brad Schulz's blog (http://bradsruminations.blogspot.ca). Take a look at his blog if you would like to go into more detail with SQL Server XML.

Traditional Technique:
Following script will do the job nicely, however it uses a temporary table which may not be suitable if it runs repeatedly in a production server.

/****************************************************
** Traditional string parsing technique
*****************************************************/ 
DECLARE @Delimiter CHAR(1)      
DECLARE @nidx INT       
DECLARE @sSlice VARCHAR(150)       
DECLARE @sVariable VARCHAR(50)
DECLARE @sValue VARCHAR(100)
DECLARE @sString VARCHAR(8000)

CREATE TABLE #temptbl
    (
      sVariable VARCHAR(50) ,
      sValue VARCHAR(100)
    )
       
SET @Delimiter = '|'
SET @nidx = 1       
SET @sString = 'sAuthorizedUser;Yes|dRequest;6/16/2012 1:37:08 PM|sSymbol;~!@#$%^&*(`)".\,|sHeader;Abnormal|sSortDirection;ASC|nSequence;3775301,3702915,13777470,5644123|sAction;Update|sType;Revisit|;Empty Variable|Empty Value;|;'

WHILE @nidx != 0
    BEGIN       
        SET @nidx = CHARINDEX(@Delimiter, @sString)       
        IF @nidx != 0
            SET @sSlice = LEFT(@sString, @nidx - 1)       
        ELSE
            SET @sSlice = @sString       
          
        IF ( LEN(@sSlice) > 0 )
            BEGIN
                SET @sVariable = SUBSTRING(@sSlice, 1, NULLIF(CHARINDEX(';', @sSlice) - 1, -1))
                SET @sValue = SUBSTRING(@sSlice, CHARINDEX(';', @sSlice) + 1, LEN(@sSlice))

                 -- put all parsed values in the temp table until the loop is finished.
                IF LEN(@sVariable) > 0
                    AND LEN(@sValue) > 0
                    INSERT  INTO #temptbl
                    VALUES  ( @sVariable, @sValue )

            END
        SET @sString = RIGHT(@sString, LEN(@sString) - @nidx)       
        IF LEN(@sString) = 0
            BREAK       
    END 

SELECT  sVariable ,
        sValue
FROM    #temptbl

IF OBJECT_ID('tempdb..#temptbl') IS NOT NULL
    DROP TABLE #temptbl

SQL Server XML Technique:
Following is the same parsing mechanism while using SQL XML parser. The beauty of this technique is that we can use a combination of characters as delimiters, such as ‘||’, |*|’, and so on and it will work flawlessly from SQL 2005 and upward.

/****************************************************
** T-SQL - XML string parsing technique
*****************************************************/
DECLARE @Delimiter CHAR(1)
DECLARE @sVariable VARCHAR(50)
DECLARE @sValue VARCHAR(100)
DECLARE @sString VARCHAR(8000)

SET @Delimiter = '|'
SET @sString = 'sAuthorizedUser;Yes|dRequest;6/16/2012 1:37:08 PM|sSymbol;~!@#$%^&*(`)".\,|sHeader;Abnormal|sSortDirection;ASC|nSequence;3775301,3702915,13777470,5644123|sAction;Update|sType;Revisit|;Empty Variable|Empty Value;|;'

SELECT dt.sVariable
     , dt.sValue
FROM
  (SELECT CASE
            WHEN charindex(';', items) = 0 THEN
              'unknown'
            ELSE
              substring(items, 1, charindex(';', items) - 1)
          END AS sVariable
        , substring(items, charindex(';', items) + 1, len(items)) AS sValue
   FROM
     (SELECT @sString scol) f
     CROSS APPLY (SELECT XMLEncoded = (SELECT [*] = f.scol
                                       FOR XML
                                         PATH (''))) f0
     CROSS APPLY (SELECT TagsList = '<x>' + replace(XMLEncoded, @Delimiter, '</x><x>') + '</x>') f1
     CROSS APPLY (SELECT XmlList = cast(TagsList AS XML)) f2
     CROSS APPLY XmlList.nodes('x') f3 (XmlNode)
     CROSS APPLY (SELECT items = XmlNode.value('.', 'VARCHAR(150)')) f4
   WHERE
     items > '') AS dt
WHERE
  dt.sVariable > ''
  AND dt.sValue > ''

Execution Plan Quality:
While the above XML technique parses the string correctly, one thing we need to keep in mind is that using XML method will not provide correct cardinality estimation. XML Reader is a Table Valued Function, therefore if we join the parsed table with a larger table it causes a plan quality issue and the performance may hinder.

Following is the execution plan which clearly indicates the cardinality estimation issue.


Plan Analysis with SQL Sentry Plan Explorer:
Execution Plan does not look good so far as it is suffering from cardinality estimation error.





Performance comparison Traditional vs. XML Parser:
Although the XML parser technique looks efficient and better design, it has some hidden gotchas. Let’s identify and compare both methods.

Indicator
Traditional Technique
XML Parser Technique
Tempdb Usage
Yes
Yes
Execution Plan
Multiple pieces
Single
Execution Plan Quality
Not helpful
Bad
Cardinality Estimation
Always accurate
Always wrong
Plan Cache bloating
Yes
No
Recompile
Yes
No
Usability
Support large number of records
Up to 100 records

Alternate solution - CLR integration:
If string parsing is necessary in OLTP implementation, then SQL Server provides another alternate method which is CLR integration. A CLR function can also be considered to see if it reduces/minimizes hidden performance gotchas.

Summary:
I hope you find this approach a useful one. It provides a new way to solve the complexity of parsing delimited string by applying newer T-SQL syntax.


Some Read:
Table-valued Function Showplan Operator
http://msdn.microsoft.com/en-us/library/ms190769(v=sql.90).aspx

CREATE FUNCTION (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms186755.aspx

Query Performance and multi-statement table valued functions
http://blogs.msdn.com/b/psssql/archive/2010/10/28/query-performance-and-multi-statement-table-valued-functions.aspx


Using CLR Integration in SQL Server 2005

No comments:

Post a Comment