SQL transactions can feel like magic at first: they seem to enable a giant Ctrl-Z/undo on data access. Nevertheless, concurrency issues still exist when multiple queries happen at the same time, and you must wield the mighty TRANSACTION ISOLATION LEVEL to vanquish them. That’s what this article will discuss.
This article assumes you already have a basic understanding of SQL transactions themselves (BEGIN TRANSACTION, COMMIT, ROLLBACK), but although you may have seen READ COMMITTED, REPEATABLE READ, or the infamous NOLOCK, these SQL “isolation levels” might still seem like strange voodoo. This article will explain what is (and isn’t) going on when reading and writing data in T-SQL, and seeks to demystify what these isolation levels are, and why you should be concerned with them.
Here’s the most important thing for understanding isolation levels:
At heart, isolation levels mean one thing: how long read locks are held inside a given transaction. (Write locks are always held until the end of the transaction.)
Also note that you can change isolation levels within a transaction. Each new isolation level will affect the read lock behavior for the subsequent queries.
Under the default isolation level of READ COMMITTED, a row is only protected by a read lock while that row is physically being read. A read lock (also known as a shared lock) protects data from being modified by another connection while it is being read.
Immediately after a row is read, it’s up for grabs, and someone else can change it. If you read the row again within the same transaction, it might have a different value the second time. Observe the following possible sequence (both queries are running simultaneously):
|Query 1||Query 2|
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; BEGIN TRANSACTION; — Assume this returns "Smith" SELECT LastName FROM dbo.Employees WHERE UserId = 1234;
|— Query 2 runs||
BEGIN TRANSACTION; UPDATE dbo.Employees SET LastName = 'Jones' WHERE UserID = 1234; COMMIT;
— Returns "Jones" SELECT LastName FROM dbo.Employees WHERE UserId = 1234; COMMIT;
Query 1 reads the last name, but then it’s unlocked, and that lets Query 2 change it.
A single SQL statement is not atomic. You may also be surprised that this can occur, even within a single statement.
Imagine a SELECT COUNT(*) FROM table query. At some level, this involves reading each row of the table to count it. Under the READ COMMITTED isolation level, it’s entirely possible to read a row, then that row gets updated in such a way that you read it again before the statement completes. Tony Rogerson provides an example of this behavior, and Craig Freedman provides a great illustration.
REPEATABLE READ resolves the previous anomalies: read locks are held until the end of the transaction. In the example, Query 1 will hold the locks for each row it reads until the COMMIT, so Query 2 has to wait until Query 1’s transaction is completed.
This leads to the tradeoff: more blocking and delays. Query 2 will be stuck waiting until Query 1 commits its transaction. If Query 1 does a lot of other work, Query 2 may wait for a long time.
This still does not prevent all anomalies, because only rows that have been read are locked.
Consider the following:
|Query 1||Query 2|
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN TRANSACTION; — Assume 2 employees exist: Adam James, Adam Johnson. SELECT FirstName, LastName FROM dbo.Employees WHERE FirstName = 'Adam';
|— Query 2 runs||
BEGIN TRANSACTION; INSERT INTO dbo.Employees (FirstName, LastName) VALUES ('Adam', 'Smith'); COMMIT;
SELECT FirstName, LastName FROM dbo.Employees WHERE FirstName = 'Adam'; — Now returns 3 results COMMIT;
Adam James and Adam Johnson are locked, but nothing prevents you from inserting a new Adam (Smith).
Additionally, just like READ COMMITTED, even a single statement can encounter other anomalies. The Tony Rogerson blog post mentioned earlier also shows a similar scenario for REPEATABLE READ, and Craig Freedman has another great illustration in this post.
There’s another consideration: indexes make a big difference. Imagine there is an index on FirstName that also contains LastName. Query 1 can use that index, so only the Adam rows will be locked. If not, Query 1 must scan the whole table, so all the rows will be locked during Query 1, not just the Adam rows.
SERIALIZABLE solves the problems of REPEATABLE READ by introducing range locks, which prevent new data from being inserted or updated if it affects a previous query. Under SERIALIZABLE, Adam Smith could not be inserted. How does this work?
Suppose you have the following names in the table (First Last):
- Adam James
- Adam Johnson
- Adam Smith
- Fred Jones
- Steph Zander
You run the following SQL:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION; SELECT FirstName, LastName FROM dbo.Employees WHERE FirstName = 'Fred';
SERIALIZABLE prevents anyone else from inserting a new Fred. Assuming you have the index described above on FirstName (including LastName), SQL will take a range lock between Fred and Steph: the range lock essentially prevents any inserts within the two values. The range lock will not block anyone else from reading those rows, but any INSERT or UPDATE that would result in a new row entering that range will be blocked.
This means that even though you only care about Fred, you cannot insert John either. It also means that in the absence of such an index, the entire table will be range locked.
Consider some other queries:
SELECT FirstName, LastName FROM dbo.Employees WHERE FirstName = 'Abigail';
There is no Abigail row, so the locked-out range will be everything before Adam.
WHERE FirstName = 'Steph';
Similarly, since there is nothing after Steph, the locked-out range will be everything after Steph.
It isn’t the case here, but note that if a condition is UNIQUE, and the appropriate row exists, a normal read lock can be taken on that row to satisfy SERIALIZABLE. Because it is UNIQUE, there is no possibility of another row being inserted. See Range Locks for more information.
The main thing to understand is that SQL needs to ensure no new rows can satisfy the condition, one way or another, and that range locks are one mechanism for doing so. Hopefully, you see how this works, and that depending on the data, SERIALIZABLE can lead to a lot of locking.
READ UNCOMMITTED/NOLOCK means what it says: read locks aren’t taken. It also effectively means: this query doesn’t need to return accurate results.
You’ve already seen how even READ COMMITTED can produce unusual results, but NOLOCK allows even more scenarios. Another transaction might add or change data, but then roll back; the NOLOCK query will see it. Queries might even fail. Tony Rogerson has another post illustrating some data consistency problems with NOLOCK, and Craig Freedman illustrates the query failure scenario.
Many people use NOLOCK as a catchall solution to blocking, but it is not appropriate to use unless your query doesn’t need to be accurate. So what should you use?
Snapshot isolation is what most NOLOCK use cases probably should be using. Instead of locks, SQL provides a fixed view of the database (a snapshot). There are two variants.
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SNAPSHOT is an isolation level that you set in your query, just like the others. I like to think of it as “read-only SERIALIZABLE”. Most locks are not taken: everything simply appears as it did when the transaction began. This would resolve all the issues we saw under READ COMMITTED and REPEATABLE READ.
The SNAPSHOT isolation level is best suited to read-only queries. If multiple transactions try to change the same rows, instead of blocking, the second query will fail when it tries to commit. Even if they don’t change the same data, Craig Freedman’s blog illustrates another unexpected scenario with concurrent UPDATE statements.
What’s the catch? There is a performance overhead, both in terms of data size and data I/O. As a result, you must opt into this feature: ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON.
This option causes READ COMMITTED to use snapshot behavior instead of locking. It is not quite the same as SNAPSHOT. Firstly, READ COMMITTED writes block as normal, instead of failing at the end (pessimistic versus optimistic concurrency).
In addition, each statement is effectively its own snapshot (instead of the entire transaction). From the READ COMMITTED section, you would still see the LastName change in this article’s example, but you would not see the abnormalities in the other two linked articles.
The primary benefit is less blocking. Under READ COMMITTED, Query 2 from our previous example would block Query 1 if it runs first. With READ_COMMITTED_SNAPSHOT turned on, it won’t. You can read more about the tradeoffs in this StackExchange question.
This, incidentally, appears to be how Oracle works by default. According to the documentation, the description of READ COMMITTED matches that of READ_COMMITTED_SNAPSHOT. An important difference is that Oracle’s SERIALIZABLE operates like SNAPSHOT.
Note that even an UPDATE (or DELETE) statement might take read locks, even though it is changing data. Consider this example:
UPDATE dbo.Employees SET LastName = 'Jones' WHERE UserId = 1234;
The database first has to read the table to find the particular record for UserId 1234. This involves read locks (if you read the query plan, this should make sense). Then it also needs a write lock to update the row. Even though it is one statement, it can be interrupted between reading and writing. This mix can result in blocking and deadlock issues, just like any other query.
UPDATE dbo.Employees WITH (UPDLOCK)
The UPDLOCK hint tells SQL Server to take update (write) locks instead of read locks. This also affects SNAPSHOT isolation (which otherwise would not have taken the locks).
Depending on how your database is set up, this might improve deadlocks and blocking or make them worse. Remember what you saw earlier: if your queries are covered by indexes, the locks will be more targeted. If the query needs to read the entire table to find the data, the entire table will be write locked, even if only one row is changed (and remember that write locks are always held until the transaction completes).
Locking behavior is a complex topic (and I have skipped some details here), but I hope this provides a basic picture and offers greater insight into your queries.
Now you know that the isolation levels only affect the read portion of queries, and that even UPDATE and DELETE statements contain reads.
You’ve seen some of the anomalous results that can happen under the different isolation levels, and I’ve pointed you to examples where even a single statement can have anomalous results.
I also showed how index design affects the scope of locks and blocking.
Hopefully you now have an increased appreciation for the tradeoffs of NOLOCK and an increased awareness of how snapshot isolation can be used instead, with the tradeoff of heavier disk usage.