加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

记一次alteruser导致的mysql主从不同步分析

(2022-05-08 15:41:35)
标签:

mysql

分类: mysql源码相关
一线同学在客户现场碰到一次较为诡异的mysql主从中断。

复现过程

1. setup a replication m1->s1 with RBR

2. Init test user on m1:
   1) create user u1@'%' identified with 'mysql_native_password';
   2) grant all on *.* to u1@'%';

3. Modify the mysql.user table to set a invalid authentication_string for test user on m1:
   1) update mysql.user set authentication_string=MD5("password") where user="u1";
   2) flush privileges;

4. Reset authentication_string to empty for test user on m1:
   1) update mysql.user set authentication_string="" where user="u1";

5. Do the alter user twice on m1:
   1) alter user u1@'%' identified with mysql_native_password by 'abc123' password expire never;
      // this one will fail with "ERROR 1396 (HY000): Operation ALTER USER failed for 'u1'@'%'"
   2) alter user u1@'%' identified with mysql_native_password by 'abc123' password expire never;
     // this one will success

6. check slave status on s1, we will see the following error:

Last_SQL_Errno: 1396, Operation ALTER USER failed for...

原因分析

第一个alter user语句报错1396,原因是mysql内存中的acl cache因为update mysql.user set authentication_string=MD5("password")的非法密码信息导致权限信息加载失败,导致acl cache中没有u1用户信息,所以报错。
而第二个alter user语句执行成功了,这块是因为第一个alter user语句在执行失败的情况下自带了flush privileges的效果,将最新的update mysql.user set authentication_string=""加载到了acl cache中,堆栈如下:
(gdb) bt
#0  reload_acl_caches (thd=0x7fff40000da0, mdl_locked=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_auth_cache.cc:3650
#1  0x00000000035eb843 in acl_end_trans_and_close_tables (thd=0x7fff40000da0, rollback_transaction=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user_table.cc:653
#2  0x00000000035ebcf2 in log_and_commit_acl_ddl (thd=0x7fff40000da0, transactional_tables=true, extra_users=0x7fffa85f0260, rewrite_params=0x7fffa85f0120, extra_error=false, write_to_binlog=true)
    at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user_table.cc:766
#3  0x00000000035fd773 in mysql_alter_user (thd=0x7fff40000da0, list=..., if_exists=false) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_user.cc:3483
#4  0x000000000330c688 in mysql_execute_command (thd=0x7fff40000da0, first_level=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:4676
#5  0x000000000330df96 in dispatch_sql_command (thd=0x7fff40000da0, parser_state=0x7fffa85f5b60) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:5174
#6  0x00000000033045e0 in dispatch_command (thd=0x7fff40000da0, com_data=0x7fffa85f6c40, command=COM_QUERY) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1938
#7  0x0000000003302867 in do_command (thd=0x7fff40000da0) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1352
#8  0x00000000034df9e7 in handle_connection (arg=0xa785660) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/conn_handler/connection_handler_per_thread.cc:302
#9  0x000000000502d937 in pfs_spawn_thread (arg=0xaa0b950) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#10 0x00007ffff7bc16ba in start_thread (arg=0x7fffa85f7700) at pthread_create.c:333
#11 0x00007ffff5e3b41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

flush privileges的堆栈如下:
(gdb) bt
#0  reload_acl_caches (thd=0x7fff40000da0, mdl_locked=false) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/auth/sql_auth_cache.cc:3650
#1  0x00000000033554b8 in handle_reload_request (thd=0x7fff40000da0, options=1, tables=0x0, write_to_binlog=0x7fffa85f523c) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_reload.cc:162
#2  0x000000000330a9ac in mysql_execute_command (thd=0x7fff40000da0, first_level=true) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:4044
#3  0x000000000330df96 in dispatch_sql_command (thd=0x7fff40000da0, parser_state=0x7fffa85f5b60) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:5174
#4  0x00000000033045e0 in dispatch_command (thd=0x7fff40000da0, com_data=0x7fffa85f6c40, command=COM_QUERY) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1938
#5  0x0000000003302867 in do_command (thd=0x7fff40000da0) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/sql_parse.cc:1352
#6  0x00000000034df9e7 in handle_connection (arg=0xa785660) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/sql/conn_handler/connection_handler_per_thread.cc:302
#7  0x000000000502d937 in pfs_spawn_thread (arg=0xaa0b950) at /home/gaoxiaoxin/mysql-repo/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#8  0x00007ffff7bc16ba in start_thread (arg=0x7fffa85f7700) at pthread_create.c:333
#9  0x00007ffff5e3b41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

而reload_acl_caches就是重新加载内存中的acl cache。所以在第一个alter user之前第2个update user语句之后如果有执行一个flush privileges的话,第一个alter user语句是不会报错,因为这时候内存中的acl cache信息是合法的了。

但是上述操作在binlog文件中记录的是:

1)
UPDATE `mysql`.`user` ...
2)
FLUSH PRIVILEGES
3)
UPDATE `mysql`.`user` ...
4)
ALTER USER 'u1'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*6691484EA6B50DDDE1926A220DA01FA9E575C18A' PASSWORD EXPIRE NEVER

相当于在slave回放的时候,相比master上少执行一次alter user,只执行了第一个报错的alter user。

所以只需要在slave执行start slave sql_thread,让它重新执行一次alter user的binlog event,同步就正常了。

修复方案

这个bug报给了官方,官方认为不应该直接去修改mysql.user表,所以不认为是bug。
但也可以从源码角度考虑下如何避免上述bug,基本思想如下:
 1. 因为alter user失败的场景总是有可能隐式地执行一个flush privileges但不记录binlog,所以对于alter user失败的场景,是潜在存在导致主从不一致的可能。
2. 所以考虑alter user一开始强制做一次flush privileges,这样也增加了alter user语句行为的确定性。

代码调整非常简单,就一行,具体可以参见提给官方的bug:https://bugs.mysql.com/bug.php?id=106942

5.7 无问题原因分析

上述复现步骤在5.7中不会导致主从中断,原因如下:
在5.7中,acl_end_trans_and_close_tables是否re-flush权限信息取决于thd->transaction_rollback_request和rollback_whole_statement。
2006   result|=
2007     acl_end_trans_and_close_tables(thd,
2008                                    thd->transaction_rollback_request ||
2009                                    rollback_whole_statement);

因为update mysql.user set authentication_string=MD5("password") where user="u1";语句修改的权限信息非法,所以之后的flush privileges并没有将该用户信息加载到内存中,所以对应的后续alter user语句实际上找不到这个user,进而报错。

1916     
1917     if (!(acl_user= find_acl_user(user_from->host.str,
1918                                   user_from->user.str, TRUE)))
1919     
1920       if (if_exists)
1921       {
1922         String warn_user;
1923         append_user(thd, &warn_user, user_from, FALSE, FALSE);
1924         push_warning_printf(thd, Sql_condition::SL_NOTE,
1925                             ER_USER_DOES_NOT_EXIST,
1926                             ER_THD(thd, ER_USER_DOES_NOT_EXIST),
1927                             warn_user.c_ptr_safe());
1928         try
1929         {
1930           extra_users.insert(user_from);
1931         }
1932         catch (...) {}
1933       }
1934       else
1935       {
1936         result= TRUE;
1937         append_user(thd, &wrong_users, user_from, wrong_users.length() > 0,
1938                     false);
1939       }
1940 
1941       continue;
1942     }


5.7中,如果找不到acl_user会直接continue,不会继续后续的replace_user_table操作,也不会设置rollback_whole_statement和thd->transaction_rollback_request。所以不会触发re-flush权限信息。

而8.0中,是否会re-flush权限信息还会额外检查thd->is_error()和extra_error,只要alter user语句执行报错了,thd->is_error()必然为true,所以也必然会触发re-flush权限信息。

 683 bool log_and_commit_acl_ddl(THD *thd, bool transactional_tables,
 684                             std::set *extra_users,
 685                             Rewrite_params *rewrite_params,   
 686                             bool extra_error,                 
 687                             bool write_to_binlog) {           
 688   bool result = false;
 689   DBUG_TRACE;
 690   assert(thd);       
 691   result = thd->is_error() || extra_error || thd->transaction_rollback_request;
...
 766   if (acl_end_trans_and_close_tables(thd, result)) result = true;


转载请注明转自高孝鑫的博客!







0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有