Introduction

With SQL Server the NOLOCK hint is very tempting especially with SELECT statements where at least one of the tables is frequently updated. Using the NOLOCK removes all queueing problems and the SELECT statement runs smoothly, but not without side effetcs.

Why do we need a hint at all

First question might be that why do we need to add this hint to get the statement running without additional waits. The answer lies on a basic database management system principle: ACID (atomicity, consistency, isolation and durability). The property isolation defines that other transactions may not see uncompleted modifications. Since SQL Server (when default locking scenario is used) uses exclusive locks which prevent acquiring read (shared) locks all reading operations must wait until exclusive locks are released. This causes the select statement to wait if the data to be fetched is locked exclusively.

Using NOLOCK hint (which means exactly the same as READUNCOMMITTED) actually bypasses the check for exclusive locks and does not set shared locks at all. So what it also means is that the query sees data that is not yet committed to the database which may result to a dirty read. This means that the modifications are not isolated so one of the main database principles is ignored. This is why NOLOCK -like behaviour isn't used by default.

What it actually means

Let's take an example. First let's create necessary objects.

----------------------------------
-- Create test objects
----------------------------------
-- Schema
CREATE SCHEMA LockTest;
GO

-- OrderEntry -table
IF OBJECT_ID ( 'LockTest.OrderEntry', 'U' ) IS NOT NULL 
DROP TABLE LockTest.OrderEntry;
GO

CREATE TABLE LockTest.OrderEntry (
Id        int     not null identity(1,1) primary key,
Amount    int     not null,
UnitPrice decimal not null,
Area      int     not null
);
GO

-- AddOrders -procedure
IF OBJECT_ID ( 'LockTest.AddOrders', 'P' ) IS NOT NULL 
DROP PROCEDURE LockTest.AddOrders;
GO

CREATE PROCEDURE LockTest.AddOrders @OrderCount int AS
BEGIN
   DECLARE @counter int = 0;

   WHILE (@counter < @OrderCount) BEGIN
      INSERT INTO LockTest.OrderEntry
         (Amount, UnitPrice, Area) 
      VALUES 
         (ROUND(RAND()*100, 0), ROUND(RAND()*100, 2), ROUND(RAND()*10, 0));

      SET @counter = @counter + 1;
   END;
END;
GO        

So now we have a single table and a procedure to create some random data. Let's add something to the table.

-- Add some data to the table
BEGIN TRANSACTION;
EXEC LockTest.AddOrders @OrderCount=100;
COMMIT;

Now if we get the sales amounts per area the result is something like the following. Note that your result are different due to the random values.

-- Get the sales grouped by area
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry
GROUP BY Area
ORDER BY Area;

And the results:

Area  TotalSales
----  ----------
0     10454
1     22608
2     22038

Simultaneous transactions, normal behaviour

Now if the database is updated and queried at the same time, what happens in normal situation. The DML operation takes the necessary locks and possibly prevents the query from finishing. This can be tested with the following. The test requires that you have two simultaneous connections to the database.

In session 1, add new rows in a transaction, leave the transaction open.

-- Session 1: Add data to the table and leave the transaction open
BEGIN TRANSACTION
EXEC LockTest.AddOrders @OrderCount=10000;

And using session 2, query again for the top 3 areas. This query won't finish but remains running and waiting for the locks to be released

-- Session 2: Get the sales grouped by area
-- won't return until transaction in session 1 is ended
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry
GROUP BY Area
ORDER BY Area;

Now when session 1 rolls back the transaction locks are released and the query in session 2 continues.

-- Session 1: roll back the modifications
ROLLBACK;

And immediately the results are displayd for session 2.

Area  TotalSales
----  ----------
0     10454
1     22608
2     22038

As you see the results are the same as previously since all the modifications were rolled back.

Results when NOLOCK is used

Now we use the exact same scenario as previously but this time with NOLOCK hint.

In session 1, again add new rows in a transaction, leave the transaction open.

-- Session 1: Add data to the table and leave the transaction open
BEGIN TRANSACTION
EXEC LockTest.AddOrders @OrderCount=10000;

And using session 2, query again for the top 3 areas but this time with the NOLOCK hint.

 -- Session 2: Get the sales grouped by area WITH NOLOCK
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry WITH (NOLOCK)
GROUP BY Area
ORDER BY Area;

Results are shown immediately (again your results may differ because of the random values).

Area    TotalSales
----    ----------
0       1321810
1       2417946
2       2539965

Now the session 1 rolls back the transaction.

-- Session 1: roll back the modifications
ROLLBACK;

And the exact same query is executed in session 2.

 -- Session 2: Get the sales grouped by area WITH NOLOCK
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry WITH (NOLOCK)
GROUP BY Area
ORDER BY Area;

The results are original since session 1 rolled back the whole thing.

Area  TotalSales
----  ----------
0     10454
1     22608
2     22038

Conclusions

What we saw was a dirty read. For a moment we saw data from the database which (in a sense of isolation) wasn't persistent. So is this a bad thing; Well it depends on the business logic. Imagine that you're creating a banking application, I'd bet that dirty read is something you really want to avoid. On the other hand if you typically don't have rollbacks and if you do, the false data you possibly see doesn't have a negative impact then you can consider using NOLOCK.

History

  • February 11th, 2011: Created
推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架