MySQL locking for the busy web developer

morgo · on Nov 6, 2013

InnoDB uses MVCC, so the only operations that respect the SELECT .. FOR UPDATE queries are updates. As long as you use transactions, I wouldn't say this is a common application pattern.

It was good for the article to mention gap locking/next key locking - a lot of users are surprised by it. The only comment I would make, is that it not required when using Row-based-replication: http://www.tocker.ca/2013/09/04/row-based-replication.html

I also have a 35min video as well which describes locking: http://www.tocker.ca/2013/09/19/locking-and-concurrency-cont...

Roboprog · on Nov 6, 2013

Since MySQL + InnoDB now uses MultiVersion Concurrency Control, why does it need these locks in the article???

(MVCC should allow "speculative", isolated, changes which fail and get rolled back in case of a conflict)

morgo · on Nov 6, 2013

What @kozlovsky said :) Only advice I would add, is SELECT .. FOR UPDATE doesn't (and shouldn't) use MVCC, so it guarantees you the most recent version of the row.

In my comment I said the use case for this feature is not typical, but I can list the case:

If you needed to read the most recent value, apply logic (external to the DB) and then write back a new value you will likely want serializable to prevent lost-updates.

A hypothetical (but poor example) might be a stats counter where you read the value, add one, then save it back.

kozlovsky · on Nov 6, 2013

Without locks lost updates are possible because default isolation level is not SERIALIZABLE

mjb · on Nov 6, 2013

As I replied to your comment further down, you (and the article) are confusing the implementation's use of locks to ensure atomicity and isolation with the applications use of locks to ensure consistency. These things are different.

falcolas · on Nov 6, 2013

If you find that you have a lot of deadlocks due to gap locking, you can disable this feature:

http://dev.mysql.com/doc/refman/5.5/en/innodb-record-level-l...

agopaul · on Nov 6, 2013

Row-level locks are really sweet in case you have to handle multiple row/tables update/insert in the same transaction, in order to keep the db consistent even under sustained concurrency. But, dead locks really strike you if you don't use them carefully. I wonder if dead locks are easyer to prevent in Postgres

Roboprog · on Nov 6, 2013

MVCC is the 21st century database hotness :-)

Rather than locking rows or tables (etc), you keep versions on rows, and allow isolated changes to versioned "duplicates" of the rows in transactions. There is a process that relatively quickly checks for conflicts during a commit -- is there a "gap" in the version numbers of rows or some such thing. If inconsistency would result, the first commit (already) wins, and the second gets rolled back. Otherwise, the replacement (or new) rows become the new "master" version.

The commit check can become a bottleneck, but it shouldn't deadlock.

Almost forgot to add: PostgreSQL has been using MVCC for quite some time.

My view of MySQL was discolored pretty badly when I needed an SQL database back in 2001 to prototype some stuff from home, and discovered how "gappy" MySQL was at the time: rollback did nothing; referential integrity not enforced; lack of transaction isolation. It felt like XBase hidden under SQL syntax (something to which I did not want to regress).

OTOH, MySQL ran on Windows, and PostgreSQL didn't (at the time), so I guess that was "the win" for many, alas. Both DBs have become much more like each other in the intervening decade, though, which is good news.

agopaul · on Nov 6, 2013

This is interesting.

I think developers should be more aware of how the DBMS works under the hood.

Edit: In my case, I still use row-level lock when implementing the state machine pattern on a model. It enforces the application-level-consistency of the model state, preventing double updates in case of high concurrency, in which the second (concurrent) update would break the application logic.

mjb · on Nov 6, 2013

You are confusing two concepts: locking used by the implementation to provide isolation and atomicity, and locking used by the application to provide consistency.

Consistency is a separate concern from Isolation and Atomicity. With the default isolation level (repeatable read) it is frequently necessary to use row-level locks to ensure consistency with concurrent transactions. This is an application-level concern, and an interactions with the isolation level rather than an interaction with the implementation. Short of SSI (serializability), hazards exist which can have counterintutive impacts on data consistency. The implementation's use of MVCC is irrelevant.

Locking is also frequently used in applications to help ensure external consistency, but that's a pattern with real and significant pitfalls that should be avoided.

Roboprog · on Nov 6, 2013

I'm not going to claim to understand the entire topic of DB locking, but I intended to refer to what happens within the DB on your behalf when the application is running a transaction without any explicit locks. If I am within a transaction, and repeat a read for some reason, I expect the same result unless my process made modifications. I forget the name of that level, but that's what I expect.

My gripe with record and related level locking comes from past pain. At one past job, our DBA spent a lot of time in Sybase (a non-MVCC DB) constantly twiddling locking granularity levels to keep processes out of each others' way. (as in deadlock)

babuskov · on Nov 6, 2013

InnoDB and MVCC in Postgres are almost indentical, so it's all up to the programmer.

Well, except for one difference. InnoDB will allow you refer to tables not controlled by the InnoDB table handler, whereas PG does not have such potential problem. If you keep it all-InnoDB, there should be no difference.

kozlovsky · on Nov 6, 2013

MVCC in InnoDB and PostgreSQL are totally different. It is PostgreSQL and Oracle MVCC who have very similar, although not identical isolation behaviour (albeit very different implementation), but InnoDB MVCC stands apart from them.

At the beginning, I should point that the default isolation level in PostgreSQL and Oracle is READ COMMITTED, whereas MySQL uses REPEATABLE READ isolation level as a default. This means that in MySQL by default all normal (not FOR UPDATE) SELECT queries see database snapshot at the time when transaction started. In PostgreSQL and Oracle all SELECT queries by default see database snapshot at the time when the query (and not the transaction) started.

However, SELECT FOR UPDATE in MySQL sees database at the moment the query started, even with REPEATABLE READ isolation level. That is, it is possible that during the same transaction on default isolation level in MySQL SELECT and SELECT FOR UPDATE see totally different rows. In Oracle and PostgreSQL, SELECT and SELECT FOR UPDATE always see the same rows (at the moment of query start in READ COMMITTED mode, or at the moment of transaction start when in SERIALIZABLE mode). In other words, in Oracle and PostgreSQL SELECTs and UPDATEs see database at the same time in past, whereas in MySQL SELECTs see consistent snapshot at past, whereas SELECT FOR UPDATE and UPDATE see current data.

Also, as MySQL uses locks more aggressively, some "write skew" problems which are possible in PostgreSQL and Oracle, are not possible in MySQL.

I don't know what kind of MVCC is "better", but IMHO Oracle and PostgreSQL MVCC is much easier to understand than MySQL one. The difference between Oracle and PostgreSQL MVCC is the following: in READ COMMITTED mode, when long UPDATE query encounter some rows updated by concurrent transaction, it is blocked until concurrent transaction COMMIT or ROLLBACK. After ROLLBACK of concurrent transaction the query will continues execution, and this behavior is the same in Oracle and PostgreSQL. If concurrent transaction is committed, then PostgreSQL re-evaluates previously blocked row to determine whether it still matches the UPDATE criteria. But PostgreSQL does not re-check previously read rows which did not satisfy UPDATE criteria, whereas is is possible that these rows were also modified by concurrent transaction and now satisfy UPDATE criteria and should be updated. Oracle behaves in a different way: If long UPDATE query encounters some rows updated by concurrent transaction, and these rows no longer satisfy the UPDATE criteria, the query restarts and sees new database snapshot at the current time (remainder - I'm speaking about the default READ COMMITTED mode). This way Oracle doesn't miss rows which can be missed by PostgreSQL, but long UPDATE can potentially be restarted many times (that is, the query starvation is possible, although not likely). http://asktom.oracle.com/pls/asktom/f?p=100:11%3a%3a%3a%3a%3...

Roboprog · on Nov 6, 2013

Yikes. Thanks, though.

code_duck · on Nov 6, 2013

I'm surprised there is no mention of database engines within this article. Presumably, only InnoDB is being discussed here since MyISAM does not support transactions at all.

comice · on Nov 6, 2013

Apparently the busy web developer has no time for database engines (certainly not MyISAM anyway :)

babuskov · on Nov 6, 2013

Yup, looks like InnoDB. I find MyISAM locking to work just fine out-of-the-box.

Also, adding proper indexes is one of the basic database tasks to do if you want any reasonable performance. I'm often adjusting tables' structure outside of 3rd normal form, just to be able to use good indexes for the expected use cases.

fideloper · on Nov 6, 2013

If your apps are write-heavy, it makes sense to allow row-locking (InnoDB) vs table-locking (MyISAM).

ivanhoe · on Nov 6, 2013

Why would you use extra locking inside of a transaction? Transaction gives you the required atomicity already

mjb · on Nov 6, 2013

For Consistency (i.e. C, not I). Depending on the operations you are trying to perform and the isolation level you have chosen, locking may be the only way to update the database consistently, even within a single transaction.

See this paper for a good explanation of how C and I interact: http://aphyr.com/data/posts/294/tr-95-51.pdf

ivanhoe · on Nov 6, 2013

Not sure if we talk of the same thing? When I say transaction I mean a number of queries between START TRANSACTION and COMMIT (or ROLLBACK) SQL command, and mysql will lock automatically all tables inside that block. If you use SELECT inside the transaction innoDB will work on a data snapshot to insure the data consistency. (btw. thanks for the link I'll read it later)

ivanhoe · on Nov 6, 2013

I stand corrected here, actually it seems that in the default repeatable-read mode START TRANSACTION will not do any table/row LOCK-ing automatically..

kozlovsky · on Nov 6, 2013

It depends on isolation level. Default isolation level in MySQL is REPEATABLE READ and not SERIALIZABLE. Without locks, lost updates are sometimes possible.

ivanhoe · on Nov 6, 2013

thanks for the explanation, I guess I need to do some extra reading on this...

kozlovsky · on Nov 6, 2013

The basic scenario where lost update is possible looks this way:

    session_1: BEGIN
    session_1: SELECT data from T1 where pk = 123

    session_2: BEGIN
    session_2: SELECT data from T1 where pk = 123
    session_2: UPDATE T1 set data = "value_2" where pk = 123
    session_2: COMMIT

    session_1: UPDATE T1 set data = "updated_value" where pk = 123
    session_1: COMMIT

In this scenario, session_1 does not see "value_2", so "updated_value" does not take "value2" into account, and update of session_2 becomes lost. There are two ways to prevent this problem:

1) Using SELECT FOR UPDATE instead of SELECT. This way session_1 take exclusive lock on the corresponding row, and session_2 execution is suspended until session_1 will commit.

2) Using more strict SERIALIZABLE isolation level. In this mode, session_1 in MySQL will take shared lock on each selected row, and each SELECT effectively becomes SELECT ... LOCK IN SHARE MODE. This way session_2 also blocks until session_1 will commit.

In PostgreSQL, SERIALIZED mode implemented the other way - SELECT works as usual (without locks), but during UPDATE session_1 will detect that the row was changed by concurrent transaction, and transaction of session_1 will be terminated with the exception "could not serialize access due to concurrent update".

Roboprog · on Nov 6, 2013

Yes, yes, yes! Having the "late" commit (session 1) fail in this case is exactly what I would expect as "correct" behavior.

However, I (now) see what MJB meant about needing an explicit (table/row) lock so that if you can start such a sequence, you should be able to complete it.