首页 > 其他分享 >5.7 与 8.0 对相同文件的 LOAD DATA 语句结果不同

5.7 与 8.0 对相同文件的 LOAD DATA 语句结果不同

时间:2024-11-14 17:40:52浏览次数:1  
标签:LOAD 8.0 set utf8mb4 5.7 greatsql sec XXX row

5.7 与 8.0 对相同文件的 LOAD DATA 语句结果不同

问题描述

某客户现场支持,由MySQL 5.7.21升级MySQL 8.0.25后,通过LOAD DATA导入文件,当同一会话连续导入不同的编码(UTF8/GB18030)文件时会出现乱码。数据库版本未升级之前,相同的导入操作在MySQL 5.7.21未出现乱码。

问题分析

1)查看简化后的 LOAD DATA语句

greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

2)查看表数据

+----------+------------------------------------------------------+
| AUTO_INC | D_NAME                                               |
+----------+------------------------------------------------------+
|        1 | xxx社会保险xxx                                        |
|        2 | xxx市路桥区xxx                                        |
|        4 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績             |
|        5 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績             |
+----------+------------------------------------------------------+
4 rows in set (0.00 sec)

3)检查业务表的字符集与校验集,发现字符集为 utf8mb4 、校验集为 utf8mb4_bin

4)检查数据库的字符集与校验集

greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
+--------------------------------------+--------------------------------+
| Variable_name                        | Value                          |
+--------------------------------------+--------------------------------+
| character_set_client                 | utf8mb4                        |
| character_set_connection             | utf8mb4                        |
| character_set_database               | utf8mb4                        |
| character_set_filesystem             | binary                         |
| character_set_results                | utf8mb4                        |
| character_set_server                 | utf8mb4                        |
| character_set_system                 | utf8mb3                        |
| character_sets_dir                   | /opt/mysql3301/share/charsets/ |
| validate_password_special_char_count | 1                              |
+--------------------------------------+--------------------------------+
9 rows in set (0.01 sec)

greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_bin        |
| collation_database            | utf8mb4_bin        |
| collation_server              | utf8mb4_bin        |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

程序在MySQL 5.7.21跑了很长时间,一直没有问题,把数据库升级MySQL 8.0.25后,新导入的数据出现部分乱码, 由此怀疑,MySQL 8.0定长数据导入LOAD DATA @row 出现BUG。

BUG场景:同一个会话 LOAD DATA多种字符集文件,使用@临时变量切割字段。将导致导入数据乱码,向MySQL官方提BUG,已证实为BUG(编号115824)

问题复现

MySQL: 8.0.25

greatsql> SELECT VERSION();
+-----------+
| version() |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)


table ddl:
CREATE TABLE `assp_sis_payres_imp_bak` (
  `AUTO_INC` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '自增列',
  `D_NAME` varchar(210) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`AUTO_INC`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;


greatsql> SHOW GLOBAL VARIABLES LIKE '%char%';
+--------------------------------------+--------------------------------+
| Variable_name                        | Value                          |
+--------------------------------------+--------------------------------+
| character_set_client                 | utf8mb4                        |
| character_set_connection             | utf8mb4                        |
| character_set_database               | utf8mb4                        |
| character_set_filesystem             | binary                         |
| character_set_results                | utf8mb4                        |
| character_set_server                 | utf8mb4                        |
| character_set_system                 | utf8mb3                        |
| character_sets_dir                   | /opt/mysql3301/share/charsets/ |
| validate_password_special_char_count | 1                              |
+--------------------------------------+--------------------------------+
9 rows in set (0.01 sec)

greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_bin        |
| collation_database            | utf8mb4_bin        |
| collation_server              | utf8mb4_bin        |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

greatsql> TRUNCATE TABLE assp_sis_payres_imp_bak;
Query OK, 0 rows affected (0.03 sec)

greatsql> SELECT charset(@row), @row;
+---------------+------------+
| charset(@row) | @row       |
+---------------+------------+
| binary        | NULL       |
+---------------+------------+
1 row in set (0.00 sec)

greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SELECT charset(@row), @row;  
+---------------+------------------------+
| charset(@row) | @row                   |
+---------------+------------------------+
| utf8mb4       | XXX路桥区社会保XXX       |
+---------------+------------------------+

greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SELECT charset(@row), @row;  
+---------------+-----------------------------------------+
| charset(@row) | @row                                    |
+---------------+-----------------------------------------+
| gb18030       | XXX路桥区社会保XXX       |
+---------------+-----------------------------------------+

greatsql>  SELECT * FROM  ASSP_SIS_PAYRES_IMP_BAK;
+----------+---------------------------------------------------------+
| AUTO_INC | D_NAME                                                  |
+----------+---------------------------------------------------------+
|        1 | XXX路桥区社会保XXX                                        |
|        2 | XXX路桥区社会保XXX                                        |
|        4 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績                 |
|        5 | 鍙板窞甯傝矾妗ュ尯绀句細淇濋櫓浜嬩笟绠$悊涓績                 |
+----------+---------------------------------------------------------+
4 rows in set (0.00 sec)

MySQL 5.7.21

greatsql> SELECT VERSION();
+------------+
| version()  |
+------------+
| 5.7.21-log |
+------------+
1 row in set (0.01 sec)

table ddl:
CREATE TABLE `assp_sis_payres_imp_bak` (
  `AUTO_INC` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '自增列',
  `D_NAME` varchar(210) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`AUTO_INC`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

greatsql>  SHOW GLOBAL VARIABLES LIKE '%char%';
+--------------------------------------+--------------------------------+
| Variable_name                        | Value                          |
+--------------------------------------+--------------------------------+
| character_set_client                 | utf8mb4                        |
| character_set_connection             | utf8mb4                        |
| character_set_database               | utf8mb4                        |
| character_set_filesystem             | binary                         |
| character_set_results                | utf8mb4                        |
| character_set_server                 | utf8mb4                        |
| character_set_system                 | utf8                           |
| character_sets_dir                   | /opt/mysql3305/share/charsets/ |
| validate_password_special_char_count | 1                              |
+--------------------------------------+--------------------------------+
9 rows in set (0.00 sec)

greatsql> SHOW GLOBAL VARIABLES LIKE '%coll%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_general_ci |
| collation_server     | utf8mb4_general_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)

greatsql> SELECT charset(@row), @row;
+---------------+------------+
| charset(@row) | @row       |
+---------------+------------+
| binary        | NULL       |
+---------------+------------+
1 row in set (0.00 sec)

greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SELECT charset(@row), @row;  
+---------------+-----------------------+
| charset(@row) | @row                  |
+---------------+-----------------------+
| utf8mb4       | XXX路桥区社会保XXX      |
+---------------+-----------------------+

greatsql> LOAD DATA LOCAL INFILE '/root/dba_zc/load/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SELECT charset(@row), @row;  
+---------------+-----------------------+
| charset(@row) | @row                  |
+---------------+-----------------------+
| gb18030       | XXX路桥区社会保XXX      |
+---------------+-----------------------+

greatsql> SELECT * FROM  ASSP_SIS_PAYRES_IMP_BAK;                                                                                                                                                 
+---------------+-----------------------------+
| AUTO_INC      | D_NAME                      |
+---------------+-----------------------------+
|        1 | XXX路桥区社会保XXX                 |
|        2 | XXX路桥区社会保XXX                 |
|        4 | XXX路桥区社会保XXX                 |
|        5 | XXX路桥区社会保XXX                 |
+---------------+-----------------------------+
4 rows in set (0.00 sec)

BUG规避方案

通过SELECT``charset(@row), @row; 可以看到@row在执行LOAD DATA后在5.7.21和8.0.25是一样的,但最终的影响不一样。虽然MySQL官方确认此问题为BUG,但没有提供规避方案或者解决方案。通过万里工程师研究后,发现一种可行的规避方案。每次执行LOAD DATA命令前执行 [set @row=_binary'';] 进行规避。

greatsql> SELECT VERSION();
+-----------+
| version() |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)

greatsql> SET @row=_binary'';
Query OK, 0 rows affected (0.00 sec)

greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_UTF8_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET UTF8MB4 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING UTF8MB4)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SET @row=_binary'';
Query OK, 0 rows affected (0.00 sec)

greatsql> LOAD DATA LOCAL INFILE '/home/greatdb/TEST_GB18030_bak.txt' IGNORE INTO TABLE ASSP_SIS_PAYRES_IMP_BAK CHARACTER SET GB18030 IGNORE 0 LINES  (@row) SET `D_NAME` = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row), 1,240)) USING GB18030)),'');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

greatsql> SELECT * FROM assp_sis_payres_imp_bak;
+----------+--------------------------------------------------+
| AUTO_INC | D_NAME                                           |
+----------+--------------------------------------------------+
|        1 | XXX路桥区社会保XXX                 |
|        2 | XXX路桥区社会保XXX                 |
|        4 | XXX路桥区社会保XXX                 |
|        5 | XXX路桥区社会保XXX                 |
+----------+--------------------------------------------------+
4 rows in set (0.00 sec)

问题总结

1.BUG原因

MySQL8.0重构定长数据导入LOAD DATA @row 出现BUG.同一个数据库会话,多次执行LOAD DATA @row命令,则第n次执行LOAD DATA @row 的字符集使用的是n-1次的字符集,当文件的字符集存在不同,例如先后处理GB18030、UTF8字符集的文件就会数据乱码。此问题MySQL官方已证实为BUG(编号115824)

2.BUG触发条件

触发条件:需同时满足以下三个条件才会触发此bug。

1)LOAD DATA命令使用类似 @row临时变量 进行数据处理,例如对定长记录按字节切割出多个字段:

LINES (@row) SET COLUMN_NAME = NULLIF(TRIM(CONVERT(UNHEX(SUBSTR(HEX(@row),1,20)) USING GB18030))。

2)在同一个连接中,多次执行LOAD DATA命令,且先后处理的文件字符集存在不同(例如GB18030和UTF8)。

3)使用MySQL 8.0。

3.BUG规避办法

由万里工程师提出,与MySQL官方社区沟通证实,涉及到满足上述BUG触发条件的场景,通过在每次执行LOAD DATA命令前执行 [set @row=_binary'';] 进行规避。

参考:https://bugs.mysql.com/bug.php?id=115824


Enjoy GreatSQL

标签:LOAD,8.0,set,utf8mb4,5.7,greatsql,sec,XXX,row
From: https://www.cnblogs.com/greatsql/p/18546504

相关文章

  • C++ 编程基础(5)类与对象 | 5.7、友元
    文章目录一、友元1、友元函数1.1、声明友元函数1.2、友元函数的特性2、友元类2.1、声明友元类2.2、友元类的特性3、使用场景4、注意事项5、总结前言:在C++中,友元类和友元函数是一种特殊的访问控制机制,允许一个类授予另一个类或函数特殊的访问权限。通过使用友元......
  • 搭建fast-whisper 环境时报错 Unable to load any of {libcudnn_ops.so.9.1.0, libcud
    fast-whisper官网地址:https://github.com/SYSTRAN/faster-whisper搭建环境时,按照官方的创建环境要求执行的使用condacreate-nfast_whisperpython=3.9创建虚拟环境执行pipinstallfaster-whisper安装库pipinstallnvidia-cublas-cu12nvidia-cudnn-cu12==8.*降低nump......
  • 通过 AWR报告查看oracle 数据库服务器的负载(load average)异常高的原因
    要诊断Oracle数据库服务器的负载(loadaverage)异常高的原因,通过AWR(AutomaticWorkloadRepository)报告可以帮助你识别潜在的瓶颈或负载源。AWR报告提供了数据库的详细性能数据,涵盖了系统负载、SQL执行、I/O性能、内存使用等多方面的信息。以下是通过AWR报告查看和诊断高负......
  • MySQL5.7新特性--官方高可用方案MGR介绍
    MGR简介MySQLGroupReplication(下简称:MGR)是MySQL官方推出的一种基于Paxos协议的状态机复制。在MGR出现之前,用户常见的MySQL高可用方式,无论怎么变化架构,本质就是Master-Slave架构。MySQL5.7版本开始支持无损半同步复制(losslesssemi-syncreplication),从而进一步提示数据复制......
  • Windows平台下安装与配置MySQL5.7
    1.下载mysql安装文件下载地址:https://downloads.mysql.com/archives/installer/  2.安装与配置MySQL双击打开,接受许可协议,点击下一步:选择自定义安装,点击下一步:选择要安装的产品,点击下一步:点击执行按钮:安装完成,点击下一步进入配置界面:配置界面继续点击下一步:进......
  • 解决 SQLyog 连接 MySQL 8.0.24 报错 2058 的方法(Windows 平台)
    解决SQLyog连接MySQL8.4.2报错2058的方法(Windows平台)在使用SQLyog连接MySQL8.4.2时,你可能会遇到错误代码2058。这个错误通常是由于MySQL8.0默认使用的caching_sha2_password身份验证插件与SQLyog不兼容导致的。本文将详细介绍如何在Windows平台上解决这......
  • Goffloader:内存执行,无需磁盘
    免责声明该公众号分享的安全工具和项目均来源于网络,仅供安全研究与学习之用,如用于其他用途,由使用者承担全部法律及连带责任,与工具作者和本公众号无关。安全公司Praetorian发布了GoffLoader,这是一种旨在简化BOF文件和非托管CobaltStrikePE文件直接在内存中执行的工具,而......
  • MySQL 5.7.19 解压版安装配置详细教程
    MySQL5.7.19解压版安装配置详细教程1.软件下载首先,从MySQL官方网站下载MySQL5.7.19的解压版安装文件:下载地址:https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-5.7.19-winx64.zip2.特别说明如果你之前安装过MySQL,并且出现了错误或者想要重新安装,可以使用以......
  • [极客大挑战 2019]Upload
    直接上传shell.php,发现提示不是图片,这时先修改后缀名或者Content-Type数。我们使用burpsuite进行操作抓包后发送到repeater,直接修改后缀名不行,看来是通过Content-Type属性来判断是否为图片的,将Content-Type修改为image/jpeg,回显not!php!将文件后缀名修改为phtml,回显包含了<?......
  • error: NU1100: 无法解析 net8.0 的“System.Management.Automation (>= 7.2.0)”。
    前言最近,在使用Net调用PowerShell,碰到了一个很不常见的错误,记录一下,也许有朋友会遇到,希望有所帮助。正文错误截图如下,其实很奇怪,一样的代码,有些地方报错,有些没事。2.文字版本的错误,方便复制粘贴,如下:MicrosoftWindows[版本10.0.22000.2538](c)Micr......