我们日常做一些MySQL测试的时候,经常要造数据,通常就写个循环插入数据的存储过程。前两天碰巧看文章说,mysql_random_data_load程序能向MySQL随机插入大量数据,于是了解一下。
mysql_random_data_load是个开源的项目,github路径如下,
https://github.com/Percona-Lab/mysql_random_data_load/releases
从信息来看,作者是一个阿根廷的程序员,
0.1.12是最新的版本,
版本信息如下,
0.1.10
- Fixed argument validations
- Fixed ~/.my.cnf loading
0.1.10
- Fixed connection parameters for MySQL 5.7 (set driver's AllowNativePasswords: true)
0.1.9
- Added support for bunary and varbinary columns
- By default, read connection params from ${HOME}/.my.cnf
0.1.8
- Fixed error for triggers created with MySQL 5.6
- Added Travis-CI
- Code clean up
0.1.7
- Support for MySQL 8.0
- Added --print parameter
- Added --version parameter
- Removed qps parameter
0.1.6
- Improved generation speed (up to 50% faster)
- Improved support for TokuDB (Thanks Agustin Gallego)
- Code refactored
- Improved debug logging
- Added Query Per Seconds support (experimental)
0.1.5
- Fixed handling of NULL collation for index parser
0.1.4
- Fixed handling of time columns
- Improved support of GENERATED columns
0.1.3
- Fixed handling of nulls
0.1.2
- New table parser able to retrieve all the information for fields, indexes and foreign keys constraints.
- Support for foreign keys constraints
- Added some tests
0.1.1
- Fixed random data generation
0.1.0
- Initial version
可以下载源代码了解他的实现,Go语言写的,
看下这个文件,README.md,基本就可以知道怎么用了,作者提到了他为什么写这个程序,平时写了很多这种构造随机数的程序,不想再造轮子了,写个通用的,
Many times in my job I need to generate random data for a specific table in order to reproduce an issue.
After writing many random generators for every table, I decided to write a random data generator, able to get the table structure and generate random data for it.
Plase take into consideration that this is the first version and it doesn't support all field types yet!
他支持如下这些数据类型,
字符串类型的构造原则如下,根据长度不同,给出不同的规则,
How strings are generated
- If field size < 10 the program generates a random "first name"
- If the field size > 10 and < 30 the program generates a random "full name"
- If the field size > 30 the program generates a "lorem ipsum" paragraph having up to 100 chars.
如果字段允许NULL,随机产生10%的空值,
The program can detect if a field accepts NULLs and if it does, it will generate NULLs ramdomly (~ 10 % of the values).
还支持外键模拟,
Foreign keys support
If a field has Foreign Keys constraints,
random-data-load
will get up to--max-fk-samples
random samples from the referenced tables in order to insert valid values for the field.The number of samples to get follows this rules:
1. Get the aproximate number of rows in the referenced table using therows
field in:EXPLAIN SELECT COUNT(*) FROM <referenced schema>.<referenced table>
1.1 If the number of rows is less thanmax-fk-samples
, all rows are retrieved from the referenced table using this query:
SELECT <referenced field> FROM <referenced schema>.<referenced table>
1.2 If the number of rows is greater thanmax-fk-samples
, samples are retrieved from the referenced table using this query:
SELECT <referenced field> FROM <referenced schema>.<referenced table> WHERE RAND() <= <fk-samples-factor> LIMIT <max-fk-samples>
mysql_random_data_load支持以下参数,
测试一下,创建一张表,
create table test(id int, c1 varchar(1));
执行指令,6秒插入10万数据,
-bash-4.2$ ./mysql_random_data_load -h127.0.0.1 -u用户名 -p密码 --max-threads=2(线程数) 数据库名称 表名 100000(插入数据量)
INFO[2021-12-17T14:31:41+08:00] Starting
6s [====================================================================] 100%
6s [====================================================================] 100%
看下效果,id就只有2条相同的,
c1列有20个值,数据比较分散,
还是能满足基本的测试数据要求的,操作简单,基本的数据类型都可以支持,空值、外键这些特殊的场景,也都能支持,不错的开源工具,