Skip to content

MDEV-38817 innodb.skip_locked_nowait fails: SELECT NOWAIT gets ER_LOCK_WAIT_TIMEOUT#4684

Closed
arcivanov wants to merge 1 commit intoMariaDB:12.3from
arcivanov:MDEV-38817
Closed

MDEV-38817 innodb.skip_locked_nowait fails: SELECT NOWAIT gets ER_LOCK_WAIT_TIMEOUT#4684
arcivanov wants to merge 1 commit intoMariaDB:12.3from
arcivanov:MDEV-38817

Conversation

@arcivanov
Copy link
Copy Markdown
Contributor

Don't send OK packet early for locking reads (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE).

MDEV-38019 added an optimization in select_send::send_eof() that sends the OK packet to the client before the server finishes cleanup, allowing the client to process results sooner. The guard
!thd->transaction->stmt.is_trx_read_write() was intended to exclude write transactions, but mark_trx_read_write() is only called for DML (INSERT/UPDATE/DELETE), not for locking reads. This caused SELECT ... FOR UPDATE with autocommit to send OK before external_lock(F_UNLCK) committed and released locks, creating a race where another connection using NOWAIT could see still-held locks and immediately fail with ER_LOCK_WAIT_TIMEOUT.

Add a check for select_lock == st_select_lex::NONE to ensure the early OK path is only taken for truly non-locking SELECT statements.

…R_LOCK_WAIT_TIMEOUT`

Don't send OK packet early for locking reads (`SELECT ... FOR UPDATE`,
`SELECT ... LOCK IN SHARE MODE`).

MDEV-38019 added an optimization in `select_send::send_eof()` that sends
the OK packet to the client before the server finishes cleanup, allowing
the client to process results sooner. The guard
`!thd->transaction->stmt.is_trx_read_write()` was intended to exclude
write transactions, but `mark_trx_read_write()` is only called for DML
(INSERT/UPDATE/DELETE), not for locking reads. This caused
`SELECT ... FOR UPDATE` with autocommit to send OK before
`external_lock(F_UNLCK)` committed and released locks, creating a race
where another connection using `NOWAIT` could see still-held locks and
immediately fail with `ER_LOCK_WAIT_TIMEOUT`.

Add a check for `select_lock == st_select_lex::NONE` to ensure the early
OK path is only taken for truly non-locking SELECT statements.
@arcivanov
Copy link
Copy Markdown
Contributor Author

@grooverdan

@grooverdan grooverdan added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 23, 2026
@svoj
Copy link
Copy Markdown
Contributor

svoj commented Feb 23, 2026

creating a race where another connection using NOWAIT could see still-held locks and
immediately fail with ER_LOCK_WAIT_TIMEOUT.

This sounds like a valid behaviour. Besides there're a lot more statements that'd most probably fail similarly, e.g. TRUNCATE TABLE ... NOWAIT after regular SELECT in another connection.

My thinking is MDEV-38019 invalidated many tests and it needs some systematic approach towards getting it fixed.

Not requesting any changes, just sharing a point to think about.

Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preliminary review. Thanks for your contribution. LGTM. Please wait for the final review.

@arcivanov
Copy link
Copy Markdown
Contributor Author

This sounds like a valid behaviour.

I have no dog in this fight (I was just asked to look at it, it's not even my bug) but to me this sounds like a violation of happens-before ACID guarantees: if the state confirmation packet arrives on the client then the happens-before relationship between the DML and the reported state of that DML has to be consistent. The arrival of the packet to the client establishes that the operation is complete. The fact that there is a background cleanup that may have side-effects on the operation that has been reported to have been completed in any other concurrent system would be interpreted as a major bug.

Note that happens-before guarantee is only implied within the same connection/transaction that first DMLed and then SELECT NOWAIT, and not within other connections that issued SELECT NOWAIT "too early" according to the some shared wall clock.

@svoj
Copy link
Copy Markdown
Contributor

svoj commented Mar 25, 2026

@arcivanov I discussed this issue with colleagues. For the time being, it was decided to consider such issues as test case bugs. At least until we find use case that is clearly broken.

We aim to implement mtr include file that'd wait for session to complete cleanup and use it in similar cases. Something similar to wait_until_count_sessions.inc. Will you be willing to implement such solution?

As for the NOWAIT, such statements may fail due to other background activity. Like InnoDB purge, event scheduler, delayed insert. One should not expect success basing on the fact that a statement in current session is completed.

We don't see this particular failure as ACID violation, because nothing have happened after SELECT ... FOR UPDATE.

But I do share your worries, it looks fragile TBH. We already seen tests failing due to InnoDB locks, MDL locks, certain statistics, performance schema hitting stale state in concurrent connection. If you have some specific use case on your mind, that you think is broken, I'd be thankful if you could share it.

Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arcivanov can you please check if the approach taken in #4765 (review) would resolve it for you?

@arcivanov
Copy link
Copy Markdown
Contributor Author

@arcivanov can you please check if the approach taken in #4765 (review) would resolve it for you?

Resolve it for me? 😄 I never encountered this problem (we don't use the NO WAIT) - you asked me to see if I could quickly pinpoint a possible fix of this issue for you which is what I did. 😄

@grooverdan
Copy link
Copy Markdown
Member

@arcivanov , thank you very much for applying your insight into understanding the roles and interactions of the early termination wrt locking. A code change was very useful too.

As discussion progressed, it seems that a different solution is desired, and that's on us (which I'm doing now). Much appreciate your help getting to this point.

@grooverdan grooverdan closed this Apr 24, 2026
@grooverdan
Copy link
Copy Markdown
Member

I reused your text - it was very good - #4986 thanks @arcivanov

@arcivanov
Copy link
Copy Markdown
Contributor Author

Happy to help! 😄

@arcivanov arcivanov deleted the MDEV-38817 branch April 24, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants