背景:
在阿里巴巴推荐的MySQL建表规范里要求如下:
表达是与否概念的字段,必须使用 is_xxx 的方式命名,数据类型是 unsigned tinyint
( 1 表示是,0 表示否)。
解释:
在MySQL里表示是和否的概念 可以使用如下三种方案:
1.使用bit(1)类型,此时bit允许存储的是ASCII中的0和1. 0表示否1表示是的概念。
2.使用tinyint unsigned类型,此时存储的是十进制数字0和1.
3.使用boolean类型,MySQL并不真正支持此类型,是为兼容其他类型的数据。
TRUE等同于1,表示是;False等同于0表示否;TRUE和FALSE不区分大小写。
true和false在存储的时候是以0和1存储的。
试验验证:
1.前期使用sysbench创建一张1000万条记录的表t1,复制三个表添加不同类型的数据:
mysql> desc t1;
+----------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-----------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| k | int(11) | NO | MUL | 0 | |
| c | char(120) | NO | | | |
| pad | char(60) | NO | | | |
| LastModifyTime | datetime | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+----------------+-----------+------+-----+-------------------+-----------------------------+
5 rows in set (0.01 sec)
mysql> create table b1 like t1;
Query OK, 0 rows affected (0.02 sec)
mysql> create table b2 like t1;
Query OK, 0 rows affected (0.04 sec)
mysql> create table b3 like t1;
Query OK, 0 rows affected (0.06 sec)
2.添加字段:
b1表采用bit(1) 类型;
b2表采用tinyint unsigned 类型;
b3表采用bool类型。
mysql> alter table b1 add column b bit(1) not null default b'0' comment '0否 1是';
Query OK, 0 rows affected (0.08 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table b2 add column b tinyint unsigned not null default 0 comment '0否 1是';
Query OK, 0 rows affected (0.02 sec)
mysql> alter table b3 add column b bool not null default false comment 'false否0 true 是1';
Query OK, 0 rows affected (0.44 sec)
Records: 0 Duplicates: 0 Warnings: 0
3.插入数据:
mysql> insert into b1 select * from t1;
Query OK, 10000000 rows affected (3 min 12.15 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
mysql> insert into b2 select * from t1;
Query OK, 10000000 rows affected (3 min 10.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
mysql> insert into b3 select * from t1;
Query OK, 10000000 rows affected (3 min 10.29 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
4.更新数据:
--更新操作:
mysql> update b1 set b=case when id%2=1 then b'1' when id%2=0 then b'0' end;
Query OK, 5000000 rows affected (2 min 3.22 sec)
Rows matched: 10000000 Changed: 5000000 Warnings: 0
mysql> update b2 set b=case when id%2=1 then 1 when id%2=0 then 0 end;
Query OK, 5000000 rows affected (1 min 50.12 sec)
Rows matched: 10000000 Changed: 5000000 Warnings: 0
mysql> update b3 set b=case when id%2=1 then true when id%2=0 then false end;
Query OK, 5000000 rows affected (1 min 51.17 sec)
Rows matched: 10000000 Changed: 5000000 Warnings: 0
5.数据查询:
--查询:
mysql> select id,k,lastmodifytime,bin(b) from b1 limit 2;
+----+---------+---------------------+--------+
| id | k | lastmodifytime | bin(b) |
+----+---------+---------------------+--------+
| 1 | 5014614 | 2018-09-13 09:38:02 | 1 |
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 |
+----+---------+---------------------+--------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b from b2 limit 2;
+----+---------+---------------------+---+
| id | k | lastmodifytime | b |
+----+---------+---------------------+---+
| 1 | 5014614 | 2018-09-13 09:42:06 | 1 |
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 |
+----+---------+---------------------+---+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b from b3 limit 2;
+----+---------+---------------------+---+
| id | k | lastmodifytime | b |
+----+---------+---------------------+---+
| 1 | 5014614 | 2018-09-13 09:48:28 | 1 |
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 |
+----+---------+---------------------+---+
2 rows in set (0.00 sec)
将b写为bit类型时候需要使用函数bin函数方可正确查询,在命令模式下查询出来的结果为空。
mysql> select id,k,lastmodifytime,b from b1 limit 2;
+----+---------+---------------------+---+
| id | k | lastmodifytime | b |
+----+---------+---------------------+---+
| 1 | 5014614 | 2018-09-13 09:38:02 | |
| 2 | 5024801 | 2018-09-12 14:39:49 | |
+----+---------+---------------------+---+
2 rows in set (0.00 sec)
在SQLyog等客户端工具中查询:
b1的查询:
id k lastmodifytime b
------ ------- ------------------- --------
1 5014614 2018-09-13 09:38:02 b'1'
2 5024801 2018-09-12 14:39:49 b'000000
b2和b3的查询:
id k lastmodifytime b
------ ------- ------------------- --------
1 5014614 2018-09-13 09:48:28 1
2 5024801 2018-09-12 14:39:49 0
结论:在易读性上使用tinyint类型和bool类型是等同的,使用bit类型则需要使用函数转换。
6.数据存储空间:
由于使用update等语句会对真实的表有碎片,在查询表真实空间大小前整理表碎片:
mysql>optimize table b1;
mysql>optimize table b2;
mysql>optimize table b3;
# du -sb b*
2868903936 b1.ibd
2868903936 b2.ibd
2868903936 b3.ibd
结论:当使用bool类型和bit(1) 的时候和tinyint是存储上是等同的。
7.查询验证:
对于b1表正确的查询语句:(b1表使用bit类型)
select id,k,lastmodifytime,b,bin(b) from b1 where b=b'0' limit 2;
对于b2表正确的查询语句:(b2表使用tinyint类型)
select id,k,lastmodifytime,b,bin(b) from b1 where b=0 limit 2;
对于b3表正确的查询语句:(b3表使用boolean类型)
select id,k,lastmodifytime,b,bin(b) from b1 where b is false limit 2;
现在对B1、B2、B3表均采用上述查询:
--B1表查询:
mysql> select id,k,lastmodifytime,b,bin(b) from b1 where b=b'0' limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.03 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b1 where b=0 limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.03 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b1 where b=false limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b1 where b is false limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.03 sec)
查询结果一致。
--B2表:
mysql> select id,k,lastmodifytime,b,bin(b) from b2 where b=b'0' limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b2 where b=0 limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b2 where b is false limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
--B3表:
mysql> select id,k,lastmodifytime,b,bin(b) from b3 where b=b'0' limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.03 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b3 where b=0 limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b) from b3 where b is false limit 2;
+----+---------+---------------------+---+--------+
| id | k | lastmodifytime | b | bin(b) |
+----+---------+---------------------+---+--------+
| 2 | 5024801 | 2018-09-12 14:39:49 | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | 0 | 0 |
+----+---------+---------------------+---+--------+
2 rows in set (0.00 sec)
上述三个语句在查询的时候均可直接查询出结果。
8.对b字段加索引查询:
mysql> alter table b1 add key ix_b(b);
Query OK, 0 rows affected (11.45 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table b2 add key ix_b(b);
Query OK, 0 rows affected (11.93 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table b3 add key ix_b(b);
Query OK, 0 rows affected (11.59 sec)
Records: 0 Duplicates: 0 Warnings: 0
--对B1表查询的执行计划:
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=b'0' limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b1
partitions: NULL
type: ref
possible_keys: ix_b
key: ix_b
key_len: 1
ref: const
rows: 4929222
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
查询走了执行计划。
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b='0' limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: NULL
partitions: NULL
type: NULL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: no matching row in const table
1 row in set, 1 warning (0.00 sec)
提示信息:没有匹配的数据。
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=0 limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b1
partitions: NULL
type: ref
possible_keys: ix_b
key: ix_b
key_len: 1
ref: const
rows: 4929222
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.02 sec)
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9858444
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.02 sec)
全表扫描。
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=false limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b1
partitions: NULL
type: ref
possible_keys: ix_b
key: ix_b
key_len: 1
ref: const
rows: 4929222
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
走索引。
实际查询一次:
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=b'0' limit 2;
+----+---------+---------------------+---+--------+-------------------+
| id | k | lastmodifytime | b | bin(b) | cast(b as signed) |
+----+---------+---------------------+---+--------+-------------------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 | 0 |
+----+---------+---------------------+---+--------+-------------------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=0 limit 2;
+----+---------+---------------------+---+--------+-------------------+
| id | k | lastmodifytime | b | bin(b) | cast(b as signed) |
+----+---------+---------------------+---+--------+-------------------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 | 0 |
+----+---------+---------------------+---+--------+-------------------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=false limit 2;
+----+---------+---------------------+---+--------+-------------------+
| id | k | lastmodifytime | b | bin(b) | cast(b as signed) |
+----+---------+---------------------+---+--------+-------------------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 | 0 |
+----+---------+---------------------+---+--------+-------------------+
2 rows in set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b='0' limit 2;
Empty set (0.00 sec)
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b is false limit 2;
+----+---------+---------------------+---+--------+-------------------+
| id | k | lastmodifytime | b | bin(b) | cast(b as signed) |
+----+---------+---------------------+---+--------+-------------------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 | 0 |
+----+---------+---------------------+---+--------+-------------------+
2 rows in set (0.00 sec)
结论:针对bit类型 最正确的查询方式是
select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b=b'0' limit 2;
其他的会发生隐式类型转换。
针对tinyint类型的执行计划:
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b is false limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 9858444
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
全表扫描。
如下的查询语句执行计划如下:
mysql> explain select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b=b'0' limit 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: b2
partitions: NULL
type: ref
possible_keys: ix_b
key: ix_b
key_len: 1
ref: const
rows: 4929222
filtered: 100.00
Extra: NULL
1 row in set, 1 warning (0.00 sec)
查询语句:
select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b=b'0' limit 2;
select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b=0 limit 2;
select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b='0' limit 2;
select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b2 where b=false limit 2;
--针对boolean类型:
查询的执行计划如B2.
分析:
MySQL把BIT当做字符串类型, 而不是数据类型。当检索BIT(1)列的值, 结果是一个字符串且内容是二进制位0或1, 而不是ASCII值”0″或”1″.
结论:
对于bit 类型的数值不使用索引,mysql 检索bit的值是不管是数值还是字符,mysql会对where 条件进行类型转化,将字符转换为数值,并比较数值对应的ascii码,如果值为1,则返回结果,否则,结果为空。
9.bit类型在ETL中:
针对boolean类型在MySQL数据库中是以0和1存储的,查询可以直接查询出。
采用bit(1)类型存储需要做转换:
mysql> select id,k,lastmodifytime,b,bin(b),cast(b as signed) from b1 where b is false limit 2;
+----+---------+---------------------+---+--------+-------------------+
| id | k | lastmodifytime | b | bin(b) | cast(b as signed) |
+----+---------+---------------------+---+--------+-------------------+
| 2 | 5024801 | 2018-09-12 14:39:49 | | 0 | 0 |
| 4 | 5026450 | 2018-09-12 14:39:49 | | 0 | 0 |
+----+---------+---------------------+---+--------+-------------------+
2 rows in set (0.00 sec)
可以看到如不做转换的时候则查询出来为空,在做数据ETL的时候对其他数据库不便于读取。
10.MySQL bit类型的bool运算和bit类型的存储:
mysql> select b'0'=0,b'0'='0',b'110000'='0',b'110000'+0;
+--------+----------+---------------+-------------+
| b'0'=0 | b'0'='0' | b'110000'='0' | b'110000'+0 |
+--------+----------+---------------+-------------+
| 1 | 0 | 1 | 48 |
+--------+----------+---------------+-------------+
1 row in set (0.03 sec)
mysql> select b'1'=1,b'1'='1',b'110001'='1',b'110001'+0;
+--------+----------+---------------+-------------+
| b'1'=1 | b'1'='1' | b'110001'='1' | b'110001'+0 |
+--------+----------+---------------+-------------+
| 1 | 0 | 1 | 49 |
+--------+----------+---------------+-------------+
1 row in set (0.00 sec)
结论:b'110000'为十进制的48,而不是字符串0.
验证:
mysql> create table b(a bit(64));
Query OK, 0 rows affected (0.04 sec)
mysql> create table bb(a bit(65));
ERROR 1439 (42000): Display width out of range for column 'a' (max = 64)
--插入数据:
insert into b values(b'1'),(b'0'),(b'01000001'),(b'01011010'),(b'01100001'),(b'00110000'),(b'00110001');
mysql> select a,a+0,bin(a),oct(a),hex(a) from b;
+----------+------+---------+--------+--------+
| a | a+0 | bin(a) | oct(a) | hex(a) |
+----------+------+---------+--------+--------+
| | 1 | 1 | 1 | 1 |
| | 0 | 0 | 0 | 0 |
| A | 65 | 1000001 | 101 | 41 |
| Z | 90 | 1011010 | 132 | 5A |
| a | 97 | 1100001 | 141 | 61 |
| 0 | 48 | 110000 | 60 | 30 |
| 1 | 49 | 110001 | 61 | 31 |
+----------+------+---------+--------+--------+
7 rows in set (0.02 sec)
mysql> select lpad(bin(a),8,'0') la,a,a+0,bin(a),oct(a),hex(a) from b;
+----------+----------+------+---------+--------+--------+
| la | a | a+0 | bin(a) | oct(a) | hex(a) |
+----------+----------+------+---------+--------+--------+
| 00000000 | | 0 | 0 | 0 | 0 |
| 00000001 | | 1 | 1 | 1 | 1 |
| 00110000 | 0 | 48 | 110000 | 60 | 30 |
| 00110001 | 1 | 49 | 110001 | 61 | 31 |
| 01000001 | A | 65 | 1000001 | 101 | 41 |
| 01011010 | Z | 90 | 1011010 | 132 | 5A |
| 01100001 | a | 97 | 1100001 | 141 | 61 |
+----------+----------+------+---------+--------+--------+
7 rows in set (0.00 sec)
结论:bit类型表示方式为bit(m),M的取值范围为1到64.bit类型存储的二进制字符串。
bit类型的数据范围为bit(1) 到bit(64),换算成十进制范围0到2^64减1;
tinyint unsigned的数据范围,十进制表示为0到255.
bit的存储近似为(M+7)/8 bytes,而tinyint为1byte。
当使用bit(1)和tinyint的时候使用的存储空间一致。
11.JDBC和bit类型:
MySQL Type Name Return value of GetColumnTypeName Return value of GetColumnClassName
BIT(1) BIT java.lang.Boolean
BIT( > 1) BIT byte[]
TINYINT TINYINT java.lang.Boolean if the configuration property tinyInt1isBit is set to true (the default) and the storage size is 1, or java.lang.Integer if not.
在一些应用场景下tinyint会被默认读取为true或者false,而不是想要的数据0和1.
jdbc会把tinyint 认为是java.sql.Types.BIT
此时需要在连接配置上使用类似如下的配置:
jdbc:mysql://localhost/databaseName?tinyInt1isBit=false
结论:
从易于读取性和大众接受程度、软件应用的通用性上推荐使用tinyint表示是和否。