Understanding InnoDB transaction isolation levels

Isolation is an important part of ACID properties that guarantee that transactions are processed in a reliable manner. Isolation ensures that concurrently running transactions do not interfere with each other. Isolation ensures data consistency. If the transactions are not isolated then one transaction could modify data that other transactions are reading hence creating data inconsistency.

Now that we have understood what isolation is let’s understand isolation levels. Isolation levels determine how isolated the transactions are from each other, it could be that there is no isolation or the maximum level of isolation which serializes transactions execution. Selecting the best isolation level really depends on your application needs, but for that you must first understand all the different isolation levels and the consequences of selecting any one of them.

InnoDB supports all the four SQL-standard isolation levels, which are listed and described in the following text.

The four isolation levels

READ UNCOMMITTED

With the READ-UNCOMMITTED isolation level there isn’t much isolation present between the transactions at all. The transactions can see changes to data made by other transactions that are not committed yet. That means transactions could be reading data that may not even exist eventually because the other transaction that was updating the data rolled-back the changes and didn’t commit. This is known as dirty read. An application rarely needs to rely on dirty needs, in fact this really can’t be called an isolation level. Simply put, there isn’t really any isolation at all and hence such a system can’t really be called a transactional system.

READ COMMITTED

With the READ-COMMITTED isolation level, the phenomenon of dirty read is avoided, because any uncommitted changes is not visible to any other transaction, until the change is committed. Within this isolation level each SELECT uses its own snapshot of the committed data that was committed before the execution of the SELECT. Now because each SELECT has its own snapshot, so the same SELECT when run multiple times during the same transaction could return different result sets. This phenomenon is called non-repeatable read.

REPEATABLE READ

With the REPEATABLE-READ isolation level, the phenomenon of non-repeatable read is avoided. This isolation level returns the same result set through out the transaction execution for the same SELECT run any number of times. A snapshot of the SELECT is taken the first time that SELECT is run during the transaction and the same snapshot is used through out the transaction when the same SELECT is executed. A transaction running in this isolation level does not take into account any changes to data made by other transactions, regardless of whether the changes have been committed or not. This ensures that reads are always consistent(repeatable). This isolation level is the default for InnoDB. Although this isolation level solves the problem of non-repeatable read, but there is another possible problem phantom reads.

SERIALIZABLE

With the SERIALIZABLE isolation level, the phenomenon of phantom reads is avoided. Transactions when run in this isolation level place locks on all records that are accessed, as well as locks the resource so that records cannot be appended to the table being operated on by the transaction. Transactions when run in this fashion, run in a serialized manner. This isolation level is the strongest possible isolation level.

SERIALIZABLE isolation level can be pretty much avoided!

As the text above describes, phantom reads is the only problem that SERIALIZABLE is helping you avoid, otherwise SERIALIZABLE is pretty much same as REPEATABLE-READ. InnoDB has a special locking feature called gap-locking which helps you avoid the phantom reads problem. In the case of gap locking, a lock is set on the gap between index records, or on the gap before the first or after the last index record. Now all you have to do avoid phantom reads is to convert such reads to locking reads using SELECT with FOR UPDATE or LOCK IN SHARE MODE.

Locking and the isolation levels

The READ-UNCOMMITTED isolation level has the least number of locking done, after that comes the READ-COMMITTED isolation level which removes most of the gap-locking and hence produces fewer deadlocks, also in the case of READ-COMMITTED, locking reads only lock the index records and not the gaps before/after them. REPEATABLE-READ has a higher level of locking as compared to READ-COMMITTED, UPDATE, DELETE use next-key locking, also locking reads also use next-key locking. SERIALIZABLE has the highest level of locking, all the simple SELECTs are automatically converted to SELECT … LOCK IN SHARE MODE, and hence all records have shared locks.

Replication and isolation levels

The default replication type in MySQL is statement-based replication, and this replication type replicates the data changes by re-executing SQL statements on slaves that were executed on the master. This requires the isolation level to be stricter (involving more locking) so that the data changes are consistent in such a way that the same SQL when executed on the slave produces the same data changes. As mentioned above, READ-COMMITTED creates a situation of non-repeatable read, hence its not safe for statement-based replication. Hence, when using statement-based replication either use SERIALIZABLE or REPEATABLE-READ isolation level. If you have MySQL version >= 5.1 then you can use READ-COMMITTED with the row-based replication, because with row-based replication you have exact information about each data row changes.

Performance and isolation levels

As I have mentioned in the “Locking and the isolation levels” section, SERIALIZABLE and REPEATABLE-READ employ lots of locking and hence creating more deadlock situations, which in turn decreases performance. In fact SERIALIZABLE is the least performant of the isolation levels, as it converts even plain reads into locking reads. REPEATABLE-READ is better in terms of locking and deadlocks but READ-COMMITTED is even better because there are fewer gap-locks. But locking and deadlocks is not the only thing when considering performance, there is another issue of mutex contention that needs consideration. There is a post by Mark Callaghan comparing both REPEATABLE-READ and READ-COMMITTED in the context of mutex contention. This post can be found here.

Conclusion

I have tried to provide a detailed description of the four isolation levels and causes and effects of using any one of them, I have also mentioned locking and performance as related to the isolation levels. Now READ-COMMITTED and REPEATABLE-READ really are the only two isolation levels from which to make your selection. And that selection really depends on the kind of application you are running. I don’t think any generalized benchmark would help you make the selection between the two. So before making any decision run benchmarks specific to your application and then try to make a decision. I also sincerely hope that you are running MySQL version >= 5.1, because if you are running any other version REPEATABLE-READ is pretty much your only choice.

  • SomeGuyNamedPaul

    The way this works is via a method called Multiple Version Concurrency Control. Each row in an Inno table has a pair of hidden columns on it. One is when it was updated and one is when it expires (deleted). “when” isn’t based upon time, but rather what transaction ID we’re up to in the transaction logs. Is the cursor is strolling through the table, it knows its own transaction ID and it will only pay attention to rows that existed when the transaction started and rows that haven’t been flagged for deletion somewhere in the past.

    There is a separate execution thread that roves the tables purging expired rows.

    The upside of this is that reads don’t have to do any locking which helps concurrency a whole lot. The downsides are of course those phantom reads as well as fact that there has to be a purge worker thread (multiples in 5.6).

    In Committed Read and Repeatable Read use MVCC. Uncommitted Read simply reads the newest rows and ignore the older ones. Serializable reads will lock every row.

  • http://www.ovaistariq.net/ Ovais Tariq

    Yes multi-versioned concurrency control works in the way you have described in layman’s terms,. but this is not how transaction isolation works,. Yes consistent read is the default mode in which InnoDB processes SELECT statements in READ-COMMITTED and REPEATABLE-READ isolation levels. And consistent read is achieved through multi-versioned concurrency control. But thats not the end of the story,.

    And yes InnoDB stores additional hidden fields with each row, and a row is never updated, instead when an update request is received the older version of the row is delete-marked and a new row is inserted with the system ID of the transaction that made the change,. There is also a rollpointer stored with each row that points to the undo-log which maintains a list of changes and the system IDs of transactions that made those changes,. these list of changes are stored in the form of a linked list,,.

    And yes MVCC is used in two of the isolation levels,. but MVCC needs a separate post and it should not be confused with transaction isolation levels., both are separate topics that warrant separate discussions,.

  • SomeGuyNamedPaul

    By all means please make MVCC the topic of your next entry, I’m sure you can go further in depth on it than I can.

    MVCC just amuses me because my Informix teammates are extremely wary of seeing any transactions out there running at RR due to lock contention, meanwhile Inno gobbles up devastating numbers of queries at RR and runs just fine.

  • http://www.ovaistariq.net/ Ovais Tariq

    I sure will have a post of MVCC coming up soon :) MVCC is very important to how InnoDB performs non-locking consistent reads,.

    Well I dont know about the inner-workings of Informix but with InnoDB RR has very few lock contention issues,. in fact RC has more contention issues,.

  • http://www.prodevtips.com/2011/09/14/mysql-innodb-optimization/ MySQL / InnoDB performance optimization | ProDevTips – Web Development Tutorials

    [...] regards to transaction isolation Ovais Tariq explains: With the READ-COMMITTED isolation level, the phenomenon of dirty read is avoided, because any [...]

  • http://www.quora.com/What-is-a-good-plain-English-explanation-of-the-four-MySQL-InnoDB-transaction-isolation-levels-and-how-they-would-affect-a-typical-web-app-in-practice#ans1019596 Quora

    What is a good plain-English explanation of the four MySQL InnoDB transaction isolation levels, and how they would affect a typical web app in practice?…

    Here are some useful links: 1. http://www.ovaistariq.net/597/understanding-innodb-transaction-isolation-levels/ 2. http://en.wikipedia.org/wiki/Isolation_(database_systems) In the second link, have a look at the “Read phenomena” section. To understan…

  • 亮亮 何

    “Although this isolation level solves the problem of non-repeatable read, but there is another possible problem phantom reads.”
    Do you mean ‘write skew’ instead of ‘phantom reads’? The ‘repeatable read’ in SQL standard allows ‘phantom reads’, however, since InnoDB uses next-key locking this anomaly does not exist in this level. Looks like it’s equivalent to ‘snapshot isolation’ in Postgres and Oracle.

  • http://www.ovaistariq.net/ Ovais Tariq

    Phantom read happens when you select records in range twice in the same transaction and they return different results, because another transaction inserted in the range before you ran the select again. Eg, SELECT * FROM tbl where id between 10 and 20; Suppose when you select this range there are only three records, 10, 13 and 17. But before you run the select again, another transaction inserts the following: INSERT INTO tbl(id) VALUES(15); So when you run the same select again you will get different values: 10, 13, 15, 17. This is avoided in InnoDB by using gap locking.

blog comments powered by Disqus