Monday, December 17, 2012

Controlling read phenomena in SQL



Dirty reads, Non-repeatable reads and phantoms are three read phenomena that can be controlled with different transaction isolation levels.

Dirty reads
When a data is changed by a transaction, another transaction will read the same data before committing the previous transaction. So the data read by second transaction are inconsistent and are called dirty reads.

Non-repeatable reads
Within a transaction, when same data read by two separate reads, different values will be returned if the data has been updated between two reads. This is called Non-repeatable reads. In another words, if a set of records read, have been updated between two or more reads, then it is Non-repeatable.

Phantom reads
Between several reads, if new records have been inserted that matches the criteria of the select statement, it is a phantom situation.

All the above situations can be controlled (allowed or prevented) by the transaction isolation levels. All the above reads are allowed with Read Uncommitted transaction isolation level.
With the Read Committed isolation level, Dirty reads are prevented and others are allowed. Non-repeatable reads are prevented with Repeatable Read isolation level and all the above are prevented with Serializable isolation level.

So the Read Uncommitted is the least level of isolation and Serializable gives the maximum isolation. However, maximum level of locks applied with the Serializable isolation level and no locks are applied with Read Uncommited isolation level.

When the maximum isolation, locks applied also maximum which will lead to blocking. To prevent this a new isolation level, Snapshot Isolation was introduced with SQL Server 2005 and later versions. This is based on keeping versions of records which gives the maximum isolation with minimum blocking.

Locking vs Blocking


Locking
This occurs when a session takes ownership of an object by acquiring a lock on the resource

Blocking
This happens when a session is trying to acquire a lock on a resource which is already locked by an another session. Since the resource is already locked, the session is blocked to acquire the resource until the previous session releases the lock.

If the above blocking process is also blocked by another process and that process is also blocked by above blocked process, It will lead to a dead-lock situation where once process need to be terminated by force.

Monday, July 16, 2012

Transactions - Bits & Pieces

SET XACT_ABORT

SQL Server uses transactions to group a set of SQL Statements. However if a Rolling back mechanism is not set up, eventhough some statments failed to execute, others would be commited. 

If SET XACT_ABORT is set to ON, all the statements are Rolledback. By default this is SET to OFF.

MARK
This is used to mark a transaction in the transaction log. This is usfull to restored a database to that mark. This similar to recover a database to a point in time.
When several databases are used in a single transaction, MARK option can be used to make all the databases to a consistant state.

SAVE POINT

Save points are used to usfull when transactions are rolled back. Transaction can be rolled back to defined save point

When there is no explisit transaction is defined, the statement
SAVE TRANSACTION will fail.
However it will work, if the transaction mode is set to IMPLICIT TRANSACTION

SET ANSI_DEFAULTS ON
When this is ON, IMPLICIT TRANSACTION mode is automatically ON

Wednesday, April 25, 2012

Index in a nutshell

In a databse table data stored in pages. One page is 8KB. Without an index these data pages are stored not in an order. Since there is no order of datapages accessing is slower as it required to scan through all the pages and records to find coorrect record.
To overcome this problem, an index can be created. Similar to data, indexs are too stored in pages. An index in a index page contains the index key value and the reference to the next page and at the leaf level, index key and the row locator to the actual data row or reference to the clustered key (if there is a clustered index).
Since an index refers to a row locator if there is no clustered index, still it required to scan all the records if a column is searched other than the column(s) in the index key.
If there is a clustered index, then the leaf levels of index refers to the clustered index key.
In clustered index, in the leaf levels itself contains the actual data pages and are in order. That is the differece than the non clustered index.

Tuesday, April 24, 2012

SQL 2012 LAG and LEAD

New TSQL functions LAG and LEAD are two amaizing features. Lets have look at a data set like below and see how these functions are important


Here it contains sales order details with customer, location, date and amount etc. To obtain the monthly total sales for 2012 follwing query can be used

SELECT MONTH(SODate) MonthOfYear ,SUM(OrderAmount) TotalSales
FROM dbo.SalesOrder
WHERE YEAR(SODATE) = 2012
GROUP BY MONTH(SODate)
ORDER BY MONTH(SODate)

So the results set will be like this


Now in each row we need to show total sales for the prevous month and the next month. In prevous versions of SQL server we need to join above results set again to obtain this results. But with the new two features LAG and LEAD this can be achived easily. Below is the query used.

SELECT MONTH(SODate) MonthOfYear ,SUM(OrderAmount) TotalSales
,LAG(SUM(OrderAmount),1) OVER( ORDER BY MONTH(SODate) ) PreviousMonth
,LEAD(SUM(OrderAmount),1) OVER( ORDER BY MONTH(SODate) ) NextMonth

FROM dbo.SalesOrder
WHERE YEAR(SODATE) = 2012
GROUP BY MONTH(SODate)
ORDER BY MONTH(SODate)

And the reslts set will be like this.

Monday, March 5, 2012

Script to find the database of a given procedure

Below script will search for all the databases and lists the name(s) of the database which the stored procedure exists.

----------------------------------------------------------------------
CREATE TABLE #DBNames (DBNo int IDENTITY(1,1), dbname varchar(200))

DECLARE @DBNo int, @MaxDBNo int, @SQLText varchar(2000)

INSERT INTO #DBNames(dbname)
SELECT name
FROM sys.databases

SELECT @MaxDBNo = MAX(DBNo), @DBNo = 1
FROM #DBNames
WHILE @DBNo < @MaxDBNo
BEGIN
SELECT @SQLText = 'IF EXISTS (SELECT ''' + dbname + ''',name FROM ' + dbname + '.sys.procedures WHERE name = ''usp_NameofTheProcedure'') SELECT ''' + dbname + ''''
FROM #DBNames
WHERE DBNo = @DBNo

EXEC(@SQLText)

SET @DBNo = @DBNo + 1
END

DROP TABLE #DBNames

----------------------------------------------------------------------

Monday, February 20, 2012

Better performance with partitioned views


Introduction

When there are tables with large number of records, performance is a concern. If a table with large number of records can be divided in to set of smaller tables, a better performance can be expected when accessing individual tables.
But since there are several tables, it is not possible to access all the tables as a single object from a application. To overcome this problem, partitioned view is a better solution.

Partitioned view
Partitioned view combines results sets taken from different tables with UNION ALL statement and a CHECK constraint should be used on all underlying tables to separate the range of data (on a selected column) between each table.

Let's see how this can be done with a simple example

Create two tables

CREATE TABLE [dbo].[Invoice_1](
[InvoiceID] [int] CONSTRAINT PK1 PRIMARY KEY NOT NULL,
[Description] [varchar] (200)
)
GO

CREATE TABLE [dbo].[Invoice_2](
[InvoiceID] [int] CONSTRAINT PK2 PRIMARY KEY NOT NULL,
[Description] [varchar] (200)
)
GO

Create the view

CREATE VIEW vw_Invoice
AS
SELECT [InvoiceID] ,[Description]

FROM [dbo].[Invoice_1]
UNION ALL
SELECT [InvoiceID] ,[Description]
FROM [dbo].[Invoice_2]

Insert two records to the two tables

INSERT INTO Invoice_1 ([InvoiceID],[Description])
VALUES (1,'LogicalPartitioning Test')

INSERT INTO Invoice_2 ([InvoiceID],[Description])
VALUES (1100,'LogicalPartitioning Test2')

Execute a SELECT statement against the view and see how the execution plan

SELECT *
FROM vw_Invoice
WHERE InvoiceID = 1

Even though the query returns only one records which is selected from the first table, the SQL server access both the tables.

Now add two constraints to the two tables and see how the execution plan changes

ALTER TABLE Invoice_1 ADD CONSTRAINT C1 CHECK (InvoiceID BETWEEN 1 AND 1000)
ALTER TABLE Invoice_2 ADD CONSTRAINT C2 CHECK (InvoiceID BETWEEN 1001 AND 2000)

Execute a SELECT statement again and see how the execution plan now

SELECT *
FROM vw_Invoice
WHERE InvoiceID = 1


Now this clearly shows that the execution plan uses only the first table when executing the same query. If the number of tables is higher, the benefit will be much higher.

Highlights
  • The ranges used in the constraint should not be overlapped between underlying tables
  • If the partitioned column is part of the PRIMARY KEY, data can directly be inserted in to the view