Total Pageviews

Wednesday, December 7, 2016

Using SQL with MongoDB – the “Aqua Data Studio” from www.aquafold.com

MongoDB is the most popular and highly ranked in the NoSQL database world. It has a flexible architecture and a very fast query execution mechanism with the support of in-memory computations. Mongo storage follows a JSON based architecture and JavaScript based programing language.

Aqua Data Studio for MongoDB:
There are a number of commercial and non-commercial tools available for MongoDB. The “Aqua Data Studio” from the developer of www.aquafold.com is one of the most versatile tools that provide numerous flexibilities and has many intuitive features with a very easy to use GUI mechanism which helps to perform various kinds of database related activities. It natively supports more than 28 different heterogeneous databases and runs on Windows, Mac and Linux. Therefore, it becomes easily possible to administer, analyze and visualize data from a single application.

SQL vs JavaScript in MongoDB:
Those who have gotten their hands dirty from writing SQL and can’t think of getting rid of SQL but still want to use SQL with MongoDB will appreciate utilizing “Aqua Data Studio”. MongoDB uses JavaScript programming language and JSON to perform data analysis as well as administering database servers, therefore, there is a very little learning curve. However, many of us like to utilize SQL like syntax instead of JavaScript against MongoDB databases.

Using both SQL and JavaScript against MongoDB with Aqua Data Studio:
 “Aqua Data Studio” gives us the flexibility to use either pure JavaScript based query or SQL like syntax. Let’s say we have a MongoDB collection named “master_data_sql” in which we store various SQL Server events collected from hundreds of database servers. Now the next step is to analyze the collection to answer any specific questions that we may have either on a daily basis or ad-hoc basis.

MongoDB Sample collection:

From the above MongoDB schema, following are some simple queries  from both JavaScript and SQL to demonstrate the similarity. All queries were executed and tested with MongoDB V3.4 and “Aqua Data Studio v17.0”.

Query Comparison #1:
Find all entries if the message column contains ‘Login failed’.

MongoDB: db.master_data_sql.find({message: /Login failed/i })
SQL: SELECT * FROM master_data_sql WHERE     message LIKE '%Login failed%'

Query Comparison #2:
Find all entries if the “event_type” column contains ‘Login failed’ and DBCC.

MongoDB: db.master_data_sql.find({event_type: /Login Failed|DBCC/i })
SQL: SELECT * FROM master_data_sql WHERE     event_type LIKE 'Login Failed' OR event_type LIKE 'DBCC'

Query Comparison #3:
Find all entries if the event_type column contains ‘Login failed’ and DBCC, display only three columns “event_date”, ”event_type” and “message”.

MongoDB: db.master_data_sql.find({event_type: /Login Failed|DBCC/i},
                        {event_date:1, event_type:1, message:1,_id:0} )
SQL: SELECT event_date, event_type, message FROM master_data_sql WHERE  event_type LIKE 'Login Failed' OR event_type LIKE 'DBCC'

Query Comparison #4:
Find all entries if the “event_type” column contains ‘Login failed’ and DBCC; event_date is equal to ‘11/23/2016’, display only three columns “event_date”, ”event_type” and “message”.

MongoDB: db.master_data_sql.find({event_date:'11/23/2016', event_type: /Deadlock|DBCC/i},
                        {event_date:1, event_type:1, message:1,_id:0} )
SQL: SELECT event_date, event_type, message FROM master_data_sql
                WHERE event_date ='11/23/2016'
   AND (event_type LIKE 'Deadlock' OR event_type LIKE 'DBCC')
  

Query Comparison #5:
A group by example.

MongoDB:
db.master_data_sql.aggregate
([
   { $match: { event_type: {$in: ["Login Failed"]}, event_date: {$gte: "11/12/2016", $lte: "11/18/2016"} } },
   { $group: { _id: { event_date: "$event_date",  event_type: "$event_type" }, total: {$sum: 1} }},
   { $project: { event_date: "$_id.event_date", event_type: "$_id.event_type", total: "$total", _id: 0} },
   { $sort: { event_date: 1, total: -1  }}
])

SQL:
SELECT
        ms.event_date,
        ms.event_type,
        COUNT(event_type) AS total
FROM
        master_data_sql AS ms
GROUP BY
        ms.event_date,
        ms.event_type
HAVING
        (ms.event_date >= '11/12/2016' AND
        ms.event_date <= '11/18/2016') AND
        event_type IN ('Login Failed')
ORDER BY
        ms.event_date,
        ms.total DESC

Figure Side by side output from the Aqua Data Studio:


Query Comparison #6:
Another group by example.

MongoDB:
db.master_data_sql.aggregate
([
   { $match: {event_date:  "12/07/2016"} },
   { $group: {_id: { event_date: "$event_date", host_server: "$host_server",
                       host_description: "$host_description",
                       event_type: "$event_type" }, total: { $sum: 1 }} },
   { $project: { event_date: "$_id.event_date", host_server: "$_id.host_server",
                       host_description: "$_id.host_description",
                       event_type: "$_id.event_type", total: "$total", _id: 0 }},
   { $sort: {  total: -1, event_type: 1} }  
])

SQL:
SELECT
        ms.event_date ,
        ms.host_server,
        ms.host_description,
        ms.event_type,
        COUNT(ms.event_type) AS total
FROM
        logdb.master_data_sql ms
GROUP BY
        ms.event_date,
        ms.host_server,
        ms.host_description,
        ms.event_type
HAVING
        (ms.event_date = '12/07/2016')
ORDER BY
        ms.total DESC,
        ms.event_type ASC

Figure Side by side output from the Aqua Data Studio:


Dashboard for data visualization:

To learn more visit:
Aqua Data Studio: http://www.aquafold.com/


Thursday, August 4, 2016

PowerShell Way: Backing up numerous databases (Full, Differential, Transaction Log) on numerous SQL Server Instances concurrently and intelligently

Do you want to back up databases on multiple servers simultaneously from a central server without deploying any code with an edge of efficiency and in an intuitive way? The purpose of this PowerShell Process development initiative is to backup hundreds of databases that reside on hundreds of database servers with different backup strategies.

Brief description of the Script:
The PowerShell function “Do_dbBackup” in the script is a work-flow type function and it utilizes multi-threading mechanism of the PowerShell Process. It requires two input parameters: one is a list of database servers which contains server information along with various backup options and another is error output file for error logging.

Following are some highlights of this process:
1.      Performing Full or Differential backup.
2.      Dumping the backup files to a local or network shared folder.
3.      Will create a folder automatically if it does not exist.
4.      Deleting old backups file based on retention period.
5.      Utilizing backup compression for Enterprise Edition.
6.      Error handling, logging and notification.
7.      Performing system database backups.
8.      Differential or T-Log backup will not happen if there is no prior full backup of a database.

Parameters found in the configuration file are adjustable to fit a specific requirement. This parameter file can be used for backing up a Transaction Log as well.

How to use the Script?
The script assumes that the executer (i.e. the domain account) of this script has permission to access all target database servers as “sysadmin” and has read\write permission for the Windows drive, Local folder or network shared folder.

The PowerShell script “DBBackupV2.ps1” contains the main function “Do_dbBackup” and it accepts two parameters as follows:

1.      $ServerListPath = list of database instances along with various options
2.      $ErrorLogFile = Error logging file

At the bottom of the script, the “Do_dbBackup” is called as follows:

$DBConfigFile = "D:\Work2016\PSScript2016\DBBackupV1\ServerList.txt"
$LogFile = 'D:\Work2016\PSScript2016\DBBackupV1\ErrorLog.txt'

Invoke-Command -ScriptBlock { Do_DBBackup -ServerListPath $DBConfigFile -ErrorLogFile $LogFile }

How to execute the Script?
The “DBBackupV2.ps1” or the “LogBackupV1.ps1” can be executed either by a Windows Task, by a SQL Server Agent or manually. Here is a simple example of using the SQL Server Agent to execute the script.


 Description of parameters:
The “ServerList.txt” is a CSV type file which can hold several database instances and corresponding  configuration options for each server. The Header of the CSV is the following.

BackupYesNo, Description, SQLInstance, FullBakPath, DiffBakPath, TLogBakPath, SysBakPath, FullBakRetentionHour, DiffBakRetentionHour, TLogBakRetentionHour, SysBakRetentionHour, day1, day2, day3, day4, day5, day6, day7

·         The first row represents the header of each column.
·         The column “BackupYesNo” is for whether a database backup will be performed or not. Value is “Yes” or “No”
·         The second column “Description” is for Database Server description. e.g. “Payment Processing Server”
·         The Third column is for SQL Server Instance name.
·         FullBakPath, DiffBakPath, TLogBakPath, SysBakPath column represents the destination folder of backup.
·         FullBakRetentionHour, DiffBakRetentionHour, TLogBakRetentionHour, SysBakRetentionHour represents the backup retention hours respectively.
·         day1, day2, day3, day4, day5, day6, day7 represents the day name and backup type. The format should be similar to Sunday\FULL, Monday\Full, Tuesday\Full, Wednesday\DIFF, Thursday\Full, Friday\Full, Saturday\DIFF

Transaction Log Backup:
The “Do_LogBackup” in the “LogBackupV2.ps1” Scripts uses the same configuration file; however, the job needs to be scheduled in regular intervals. The process also creates folder automatically if required and deletes old t-log backup files.

Download Link:

Script download link: http://bit.ly/2aSTbhz

The shared link contains two scripts: one for Database backup and another for Log backup. Download then extract all three files in a folder. Change the CSV file path and adjust configuration parameters according to your needs.


Review this script, test it and enhance it to incorporate your own ideas and requirement. 

Wednesday, April 13, 2016

Various sources of I/O issues

To achieve a reasonably good performance from SQL Server implementations, careful planning of the I/O subsystem and applying all good practices is crucial. At the beginning of a database application life cycle, I/O issues are undetectable as the application runs against a smaller database. As databases grow, database server performance issue starts being noticed. Tackling these issues later down the road is definitely unproductive and a cumbersome process.

Latency Measurement:
If an application is designed optimally and the I/O subsystem is configured correctly, then Microsoft has a very good recommendation about the I/O latency. These recommendations are well accepted by all the industries experts and can be evaluated against OLTP or DSS implementation (Microsoft TechNet source). Please note that the acceptance level of I/O Latency would slightly vary based on some factors such as random, sequential, and I/O size (8K, 64K, 128K, etc.).

PerfMon Counter
Threshold Value
Data or Log
Type of workload
Average Disk/sec Read & Average Disk/sec Write
1ms - 5ms
Log

4ms - 20ms
Data
OLTP
30ms or less
Data
DSS

I/O measurement and using PerfMon:
There are several performance counters and different technique existing that can be used to measure I/O performance, latency and IOPS. Following are some widely used Windows PerfMon counters that are trustworthy.

Measuring of disk latency:
·         Average Disk sec/Read
·         Average Disk sec/Write
·         Average Disk sec/Transfer

Measuring disk throughputs:
·         Disk Read Bytes/sec
·         Disk Write Bytes/sec
·         Disk Bytes/sec

Measuring IOPS:
·         Disk Reads/sec
·         Disk Writes/sec
·         Disk Transfers/sec

Measuring a I/O requests if it splits into multiple requests:
·         Split IO/Sec

When is a disk overwhelmed? Usually when the disk throughput increases, latency also increases more or less. However, when the disk throughput remains almost the same but the latency increases as time passes, it results in disk saturation or I/O bottleneck.

Source of I/O Issues:
There are numerous reasons why a disk experiences bottleneck on a SQL Server. Following are some handful factors:

·         Inadequate memory for the buffer pool.
·         Index fragmentation.
·         Outdated statistics.
·         Improper or non-optimal fill factor.
·         Not using data compression (enterprise edition only).
·         No isolation of Index, Data and Log files.
·         Structure of database schema such as indexes, row width, data types.
·         Not using T-SQL performance based Set-Base technique.
·         Using nested views.
·         Excessive sort operation such as ORDER BY and GROUP BY.
·         Using Implicit Transaction.
·         Using lengthy Transaction.
·         Excessive using of NOLOCK hints.
·         Using CURSOR method.
·         Lack of covering indexes.
·         Using wider key for clustered index.
·         Workload nature - READ oriented vs. WRITE oriented.
·         Non optimal RAID configuration.
·         Volume alignment (also known as sector alignment).
·         NTFS Block size (also known as cluster size or Allocation Unit Size).
·         Suboptimal drivers or firmware used on the host HBAs or storage array.
·         Improper queue depth settings on HBAs.
·         Incorrectly configured multipath software and/or fiber switches.

Using and detecting I/O Issues using my tool:
In my tool, “I/O Response (ms)” represents the overall “I/O Latency” on a SQL Server Instance. The calculation method includes all the drives where data files are placed. The T-SQL code which has been used to calculate the “I/O Response (ms)” in my tool has been extracted from the SSMS “Activity Monitor”. There may be a fraction of a millisecond calculation variation but it will provide you the most critical current I/O and workload state.

Figure#1: Overall I/O Response (ms)

Under the “Database” tab, there is a “Database I/O” tab which calculates I/O operations which has been derived from “sys.dm_io_virtual_file_stats”. This will provide you with a far more granular and drill-down information about I/O which are occurring on various data and log files. To use this feature, multiple clicks on the lightning bolt button are required to activate and to view the current I/O performance data.

Figure#2: I/O Operation on Data and Log files

Read Gurus Articles:
Are I/O latencies killing your performance?

SQL Server Best Practices Article

Friday, April 1, 2016

Compressing everything in a SQL Server Database

Data compression is one of the most appreciated features of SQL Server. However, many of us overlook the benefits of data compression in terms of storage and query performance. If applied correctly, we could have save up to 30% disk space and speed up the query by at least 10 times. Although, there will be a penalty of 2% to 5% of increased CPU usage while compressing or decompressing pages, the gain is far more significant than adding some negligible processor usage.

Data Compression Setting:
Data compression is a feature of Enterprise Edition only and there is no any specific or particular server or database wide setting that can be turned on to implement the feature either server wide or on a per database basis. I think that the folks at Microsoft should consider implementing database compression feature settings at a database level so that we can use this feature more effectively.

What to compress?
Data compression needs to be applied individually on a per table or index basis and it can be applied to a heap table, clustered index, non-clustered index, indexed view, columnstore table and an index. For example, a clustered index table might have four non-clustered indexes and one nonclustered columnstore index. Thus you can have “PAGE” level compression on the clustered index, “ROW” level compression on the nonclustered indexes, “PAGE” level compression on the nonclustered indexes and “COLUMNSTORE_ARCHIVE” on a nonclustered columnstore index.

Figure#1: Different data compression settings on a table:

Some Good Points from MSDN:
1.      Enabling compression can cause query plans to change because the data is stored using a different number of pages and number of rows per page.
2.      Compression does not affect backup and restore.
3.      Compression does not affect log shipping.
4.      Compression has some affect and consideration on replication.
5.      Data compression is incompatible with sparse columns.
6.      When data is exported, even in native format, the data is output in the uncompressed row format.
7.      When data is imported, if the target table has been enabled for compression, the data is converted by the storage engine into compressed row format.
8.      When a clustered index is dropped, the corresponding heap partitions retain their data compression setting unless the partitioning scheme is modified. If the partitioning scheme is changed, all partitions are rebuilt to an uncompressed state.
9.      Columnstore tables and indexes are always stored with columnstore compression. To reduce the size of columnstore data further, configuring an additional compression called archival compression can be used. It may slow down the query execution and introduce high resource utilization.
10.  Compression is not available for system tables.
11.  A clustered index or Heap table can have different types of data compression levels.
12.  Regular index settings/options can be used with clustered or nonclustered indexes.
13.  Clustered or nonclustered columnstore indexes have fewer index settings.

Script to check data compression level:
Following is a handy script to check what has already been compressed and what has not:

/************************************************************
 * Check data compression level
 ************************************************************/ 
SELECT SCHEMA_NAME(o.schema_id) + '.' + OBJECT_NAME(o.object_id) AS table_name,
       o.type_desc,
       i.name                   AS index_name,
       p.data_compression_desc  AS compression_type,
       i.type_desc              AS storage_type
FROM   sys.partitions p
       INNER JOIN sys.objects o
            ON  p.object_id = o.object_id
       JOIN sys.indexes i
            ON  p.object_id = i.object_id
            AND i.index_id = p.index_id
-- WHERE  p.data_compression > 0
--     AND OBJECT_NAME(o.object_id) = 'Address1'

You can also get the similar information by using my free tool.

Figure#2: Different data compression settings on a table:
 
Some T-SQL Script to compress data in a database:

(a)    Compress all HEAP in a database:

/************************************************************
 * Data compression on HEAP
 ************************************************************/
SELECT 'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('HEAP')

(b)   Compress all CLUSTERED index tables:

/************************************************************
 * Data compression on a CLUSTERED index table and underlying indexes
 ************************************************************/
SELECT 'ALTER INDEX ALL ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('CLUSTERED')
  
(c)    Compress all NONCLUSTERED indexes:

/************************************************************
 * Data compression on nonClustered index
 ************************************************************/
SELECT 'ALTER INDEX [' + i.[name] +']'+ ' ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('NONCLUSTERED')
  
(d)   Compress all CLUSTERED COLUMNSTORE tables:

/************************************************************
 * Data compression on Clustered columnstore index
 ************************************************************/
SELECT 'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (MAXDOP = 16, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('CLUSTERED COLUMNSTORE')

(e)   Compress all NONCLUSTERED COLUMNSTORE indexes:

/************************************************************
 * Data compression on nonClustered columnstore index
 ************************************************************/
SELECT 'ALTER INDEX [' + i.[name] +']'+ ' ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.[name]
       + '] REBUILD PARTITION = ALL WITH (MAXDOP = 16, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)'
FROM   sys.objects o
       JOIN sys.indexes i
            ON  o.[object_id] = i.[object_id]
WHERE  o.[type_desc] = 'USER_TABLE'
       AND i.type_desc IN ('NONCLUSTERED COLUMNSTORE')

(f)     Data compression with different index settings:

/****************************************************************
 * Data compression on all nonclustered indexes
 *****************************************************************/
DECLARE @table_name AS VARCHAR(256)
DECLARE @index_name AS VARCHAR(256)
DECLARE @schema_name AS VARCHAR(256)
DECLARE @type_desc AS VARCHAR(50)

DECLARE CurCompress CURSOR
FOR
    SELECT SCHEMA_NAME(o.[schema_id])  AS [schema_name],
           o.[name]                    AS [table_name],
           i.[name]                    AS [index_name],
           i.[type_desc]
    FROM   sys.objects o
           JOIN sys.indexes i
                ON  o.[object_id] = i.[object_id]
    WHERE  o.[type_desc] = 'USER_TABLE'
    ORDER BY
           o.[name]
      
OPEN CurCompress
FETCH NEXT FROM CurCompress INTO @schema_name, @table_name, @index_name, @type_desc
WHILE @@fetch_status = 0
BEGIN
    IF @type_desc = 'NONCLUSTERED'
    BEGIN
        PRINT '/****************************************************************************************'
        PRINT '* Data compression for the table: ' + @table_name
        PRINT '****************************************************************************************/'
        PRINT 'ALTER INDEX [' + @index_name + '] ON [' + @schema_name + '].[' + @table_name + '] '
        PRINT 'REBUILD WITH ('
        PRINT '        FILLFACTOR = 95,'
        PRINT '        SORT_IN_TEMPDB = OFF,'
        PRINT '        STATISTICS_NORECOMPUTE = OFF,'
        PRINT '        ONLINE = OFF,'
        PRINT '        ALLOW_ROW_LOCKS = ON,'
        PRINT '        ALLOW_PAGE_LOCKS = ON,'
        PRINT '        MAXDOP = 16,'
        PRINT '        DATA_COMPRESSION = PAGE'
        PRINT ')'
        PRINT 'GO'
        PRINT 'PRINT ''Compression completed on - ' + @table_name + ''''
        PRINT ' '
    END

    IF @type_desc = 'NONCLUSTERED COLUMNSTORE'
    BEGIN
        PRINT '/****************************************************************************************'
        PRINT '* Data compression for the table: ' + @table_name
        PRINT '****************************************************************************************/'
        PRINT 'ALTER INDEX [' + @index_name + '] ON [' + @schema_name + '].[' + @table_name + '] '
        PRINT 'REBUILD WITH ('
        PRINT '        MAXDOP = 16,'
        PRINT '        DATA_COMPRESSION = COLUMNSTORE_ARCHIVE'
        PRINT ')'
        PRINT 'GO'
        PRINT 'PRINT ''Compression completed on - ' + @table_name + ''''
        PRINT ' '
    END

   
    FETCH NEXT FROM CurCompress INTO @schema_name, @table_name, @index_name, @type_desc
END
CLOSE CurCompress
DEALLOCATE CurCompress

References:

Data Compression

Data Compression: Strategy, Capacity Planning and Best Practices

ALTER INDEX (Transact-SQL)

SQL Server Data Compression – Beauty or Beast?