Total Pageviews

Wednesday, March 6, 2013

Table-Valued Function (TVF) – hidden gotchas and possible alternatives

Both the Table-Valued Function (TVF) and Inline Table-Valued Function (ITVF) are user defined functions that return table data type.

TVF and ITVF can be used to achieve the functionality of parameterized views.  Which means a view does not support parameters in the search conditions specified in the WHERE clause. But user-defined functions do support parameters in the search conditions specified in the WHERE clause.

Inline Table-Valued Function:
To define TVF and ITVF you must follow their own set of rules. Below is a brief description about how an ITVF can be defined.

1.       The RETURNS clause contains only the keyword “table”. There is no need to define the format of a return variable, because it is set by the format of the result set of the SELECT statement in the RETURN clause.
2.       There is no function body delimited by BEGIN and END.
3.       The RETURN clause contains a single SELECT statement in parentheses.
4.       It accepts only constants or @local_variable arguments.

Disadvantages:
It is always preferable to use ITVF over TVF whenever feasible to get the best possible performance which we will see later. Following are some critical disadvantages we observed while using TVF.

1.       Cardinality estimate for TVF will be always wrong, as it produces a dynamic table at execution time thus no statistics.
2.       TVF requires intermediate results which will go into tempdb.
3.       Confusing execution plan and cost estimation.
4.       Consumes more CPU because lack of statistics.
5.       Causes more performance issues when used with LEFT JOIN.

Performance Comparison:
Here, we will be reviewing performance differences among TVF, ITVF and SELECT statement.

1.       Create a new database.
2.       Create a table and populate it with 100,000 records.
3.       Create clustered and covering indexes.
4.       Create a TVF which is “fn_myTVF” and create an ITVF which is “fn_myInlineTVF”.
5.       Write an equivalent query for the above.

In our test we will clear Buffer cache in each run and will collect STATISTICS IO and TIME output.

Let’s get started:

1.       Create a database “TestDB”
CREATE DATABASE TestDB
GO

USE TestDB
GO

2.       Create a table “tblLarge” and populate it with 100,000 records.
IF OBJECT_ID('tblLarge') IS NOT NULL
    DROP TABLE tblLarge
GO

CREATE TABLE tblLarge
    (
      xID INT IDENTITY(1, 1) ,
      sName1 VARCHAR(100) ,
      sName2 VARCHAR(1000) ,
      sName3 VARCHAR(400) ,
      sIdentifier CHAR(10) ,
      dDOB DATETIME ,
      nWage NUMERIC(20, 2) ,
      sLicense VARCHAR(25)
     )
GO

-- insert 100,000 dummy records
SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*50) ,  -- sName1
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*60) ,  -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND()*70) ,  -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,          -- sIdentifier    
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,        -- dDOB
          ( RAND() * 1000 ) ,                              -- nWage
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)    -- sLicense
         )
GO 100000

3.       Create indexes:
-- create a clustered index
ALTER TABLE dbo.tblLarge ADD CONSTRAINT PK_tblLarge PRIMARY KEY CLUSTERED (xID)
GO

CREATE NONCLUSTERED INDEX IX_tblLarge_sIdentifier ON dbo.tblLarge
       (sIdentifier)
INCLUDE (sName1, sName2, sName3) 
       WITH( STATISTICS_NORECOMPUTE = OFF)
GO

4.       Create the Table-Valued Function (TVF) “fn_myTVF”:
CREATE FUNCTION dbo.fn_myTVF
    (
      @sIdentifier VARCHAR(10)
    )
RETURNS @myTbl TABLE
    (
      xID INT NOT NULL ,
      sName1 VARCHAR(100) ,
      sName2 VARCHAR(1000) ,
      sName3 VARCHAR(400)
    )
AS
    BEGIN  
        INSERT  INTO @myTbl
                SELECT  xID ,
                        sName1 ,
                        sName2 ,
                        sName3
                FROM    tblLarge
                WHERE   sIdentifier = @sIdentifier
     
        RETURN 
    END
GO

5.       Create the Inline Table-Valued Function (ITVF) – “fn_myInlineTVF”:
CREATE FUNCTION dbo.fn_myInlineTVF
       (
         @sIdentifier VARCHAR(10)
       )
RETURNS TABLE
AS
RETURN
       ( SELECT xID ,
                sName1 ,
                sName2 ,
                sName3
         FROM   tblLarge
         WHERE  sIdentifier = @sIdentifier
       )
6.       Now, run the following query a few times and examine the Actual Execution Plan (CTRL+M)

SET STATISTICS IO ON
SET STATISTICS TIME ON

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM dbo.fn_myTVF('AA') a

DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM dbo.fn_myInlineTVF('AA') a

DBCC DROPCLEANBUFFERS
SELECT a.xID, a.sName1, a.sName2, a.sName3 FROM tblLarge a WHERE sIdentifier = 'AA'

Three execution plans:
It looks like the first Execution Plan which is “fn_myTVF” has better efficiency and is less expensive when compared with the other two!


Let’s hover over the mouse on the first node of the plan tree (SELECT Cost 0%), we will see that the estimated number of rows is 1 (one) and for which the estimated sub-tree cost is also low “0.0032863”, though in reality the query return 366 rows in our case.



Thus we can easily conclude that the cardinality estimation is completely wrong and so is the query cost.


STATISTICS TIME and IO output:
From this comparison, we can easily notice that the TVF version is a more expensive operation than the other two.

--- TVF
Table '#267ABA7A'. Scan count 1, logical reads 5, physical reads 0,
                           read-ahead reads 0, lob logical reads 0,
                           lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 32 ms,  elapsed time = 179 ms.

-- ITVF
Table 'tblLarge'. Scan count 1, logical reads 8, physical reads 3,
                           read-ahead reads 5, lob logical reads 0,
                           lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 134 ms.

--SELECT Statement
Table 'tblLarge'. Scan count 1, logical reads 8, physical reads 3,
                           read-ahead reads 5, lob logical reads 0,
                           lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 140 ms.


Conclusion:
If it is possible, Inline table-valued function should be used to avoid hidden performance gotchas. Alternatively, the TVF functionality may be merged into the main Stored Procedure or in the SQL batch.

1 comment: