Total Pageviews

Wednesday, July 3, 2013

Self-join incurs more I/O activities and increases locking overhead!

The Self-Join technique is commonly used to create a hierarchical tree set or finding duplicate records. It is also used to find previous or next values from a table by utilizing the inline table technique. Generally a self-join is a join in which a table is joined with itself. For example, when we need details about an employee and his manager where both employee and manager ID resides in the same table.

When we use self-join, it usually indicates that the table is not normalized. This may not be a problem for a small table but for a large and busy OLTP table with higher concurrency, this may lead to poor performance and degrade query response time.

The key point is that each self-joined table will be treated as a separate table while retrieving data from the disk, thus self-join incurs more I/O activities and increases locking overhead as it touches the same table twice or more. By adding appropriate indexes on JOIN and WHERE clause columns, it will reduce I/O activities and improve performance. It will be a good approach to avoid self-join whenever possible for heavy duty OLTP tables and queries.

Let’s perform a simple self-join test to see how the execution plan looks like. We can observe that as the retrieval number of records increases, each query behaves differently, such as requiring specific index and introducing parallel query.

Queries:
-- self-join one
select top (10)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

-- self-join two
select top (100)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

-- self-join three
select top (1000)  a.xid, a.sIdentifier
from    tbllarge a
        inner join tbllarge b on a.xid = b.xid
where   a.sIdentifier ='A1'

Execution Plan:

2 comments:

  1. Short and Informative..Excellent Job...!!!!!!

    ReplyDelete
  2. Very interesting, haven't thought much of the self join from a performance perspective, so good insight! A nice reference here also: sql self join

    ReplyDelete