Skip to content

MDEV-7270: Fix replicas not honoring slave_skip_errors for 1677#4653

Merged
bnestere merged 2 commits intoMariaDB:12.3from
OmarGamal10:mdev-7270
Apr 28, 2026
Merged

MDEV-7270: Fix replicas not honoring slave_skip_errors for 1677#4653
bnestere merged 2 commits intoMariaDB:12.3from
OmarGamal10:mdev-7270

Conversation

@OmarGamal10
Copy link
Copy Markdown
Contributor

@OmarGamal10 OmarGamal10 commented Feb 15, 2026

Summary

  • This PR fixes replicas crashing on column type mismatch for row-based replication, not honoring the slave_skip_errors configuration.
  • The issue comments on JIRA pointed to an upstream (mysql) test, which was helpful for tracking the bug and eventually implementing the fix.
  • The replica now skips the event gracefully (in case the error is actually caught and the column is not convertible), logs a warning and does not abort.

Fix: MDEV-7270

@ParadoxV5 ParadoxV5 requested a review from bnestere February 15, 2026 18:13
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 16, 2026
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please squash all of your commits into a single one and add a commit message to it that complies with CODING_STANDARDS.md.

@bnestere bnestere added the Replication Patches involved in replication label Mar 13, 2026
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OmarGamal10 !

Referencing your Zulip note

Though, I did this on main, then found out I had to do this on the oldest maintained version, which is 10.6 . Now there's some changes in the Rows_log_event::do_apply_event function, upon which I depended on with my fix, these changes do not exist in 10.6 along with some others, what should I do in this case?
...
I expect it would be to find a way to do the fix for 10.6, but the thing is the main has some helper functions and some modifications ~9 months ago-ish that makes handling the row errors much cleaner.

Let's actually keep this change on 12.3 where such changes exist (that should reduce the size of the patch). --skip-slave-errors is mostly used for testing purposes, not production, so missing this in some earlier versions won't be a big deal, and it reduces risk of accidental bugs in early GA versions. Also note, 10.6 is restricted to critical bug fixes only, so we couldn't put this in 10.6 anyway.

Sorry I wasn't able to answer earlier :( I was on vacation when you initially asked the question on Zulip, and have only just been able to take a look at your PR now.

Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors-slave.opt Outdated
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 29, 2026

CLA assistant check
All committers have signed the CLA.

@OmarGamal10 OmarGamal10 changed the base branch from 10.6 to 12.3 March 29, 2026 14:32
@OmarGamal10
Copy link
Copy Markdown
Contributor Author

Hi @OmarGamal10 !

Referencing your Zulip note

Though, I did this on main, then found out I had to do this on the oldest maintained version, which is 10.6 . Now there's some changes in the Rows_log_event::do_apply_event function, upon which I depended on with my fix, these changes do not exist in 10.6 along with some others, what should I do in this case?
...
I expect it would be to find a way to do the fix for 10.6, but the thing is the main has some helper functions and some modifications ~9 months ago-ish that makes handling the row errors much cleaner.

Let's actually keep this change on 12.3 where such changes exist (that should reduce the size of the patch). --skip-slave-errors is mostly used for testing purposes, not production, so missing this in some earlier versions won't be a big deal, and it reduces risk of accidental bugs in early GA versions. Also note, 10.6 is restricted to critical bug fixes only, so we couldn't put this in 10.6 anyway.

Sorry I wasn't able to answer earlier :( I was on vacation when you initially asked the question on Zulip, and have only just been able to take a look at your PR now.

No worries :). I noticed error handling changed from 10.6 to 12.3, and that comptaible_with() does not handle the error itself anymore but marks it for handling later. So I just added a bool to the thd, set in compatible_with() to skip the event in do_apply_event().

Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for rebasing and updating the patch, @OmarGamal10 ! I've left some notes.

Comment thread sql/rpl_utility_server.cc Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread sql/slave.h
@OmarGamal10 OmarGamal10 force-pushed the mdev-7270 branch 4 times, most recently from ca21a82 to c32a9d0 Compare March 31, 2026 22:06
@OmarGamal10 OmarGamal10 requested a review from bnestere March 31, 2026 22:14
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work, @OmarGamal10 !

I've left another round of feedback. It is quite close!

Another note for your git commit, please mention your reviewer (me):

Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>

Comment thread sql/rpl_utility_server.cc Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
call mtr.add_suppression("Slave SQL.*Error executing row event: .Table .test.t. doesn.t exist., Error_code: 1146");
call mtr.add_suppression("Slave SQL.*Column 0 of table .test.t. cannot be converted from type.* Error_code: 1677");
call mtr.add_suppression("The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state");
call mtr.add_suppression("Got error 1 during COMMIT");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor this nor the one above it. You should only need

call mtr.add_suppression("Slave SQL.*Column 0 of table .test.t. cannot be converted from type.* Error_code: 1677");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need any. The pattern for suppressing warnings is added to global_suppressions.

Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Comment thread sql/rpl_record.cc Outdated
Comment thread mysql-test/suite/rpl/t/rpl_row_slave_skip_errors.test Outdated
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please keep working with Brandon on the final review.

@OmarGamal10 OmarGamal10 force-pushed the mdev-7270 branch 2 times, most recently from a69b765 to 91292b2 Compare April 10, 2026 11:14
@grooverdan grooverdan requested a review from bnestere April 14, 2026 03:12
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OmarGamal10 !

I only have the last suggestion.

Comment thread sql/rpl_record.cc Outdated
Copy link
Copy Markdown
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the continued work @OmarGamal10 ! The patch looks good.

Setting slave_skip_errors to 1677 (ER_SLAVE_CONVERSION_FAILED) doesn't work
as intended. The slave aborts replication on column mismatch. This fix
makes it log a warning and set the problematic field to the default
value gracefully.
Reviewed-by: Brandon Nesterenko <brandon.nesterenko@mariadb.com>
@bnestere bnestere enabled auto-merge (rebase) April 27, 2026 22:46
@bnestere bnestere merged commit 9cd3351 into MariaDB:12.3 Apr 28, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

5 participants