optimize table 治理mysql 高水位和空间碎片

标签：task batch detail swltest mysql table optimize

问题背景

随着系统上线时间的增长， mysql数据库的数据目录的文件系统使用率已经快要耗尽。从最初的两三个月 truancate一次日志表，到最后每周要truancate 多次日志表。
业务侧将数据库中历史数据（超过全部数据的2/3）逐渐进行了迁移（采用目标库insert，源库delete的方式），这些表中有多个超过100GB的大表，甚至有两个在500GB以上的超大表。
虽然数据被迁移了，但是空间并没有被释放出来，业务侧的事情完成了，接下来需要我们在释放这些delete之后留下的空间。

注意事项

锁表

引用在操作的准备阶段和提交阶段，会对表加排它锁。特别是准备阶段，如果表上有大事务，那 optimize操作在拿到锁之前会阻塞后续的所有操作，包括select

磁盘空间

引用执行过程中，会复制表的数据到一个中间文件中。所以需要注意有没有足够的空间进行此操作
执行过程中，表上的DML 操作会记录到一个临时文件中，文件的大小受innodb_online_alter_log_max_size控制，单位是字节。如果表上的dml比较多，执行操作前要调大 innodb_online_alter_log_max_size 参数。（此参数动态生效）

主从延时

optimize table 是一个比较耗时的操作，当操作被同步到从库之后，就会造成主从延时。对于对延时敏感的系统，建议在主库执行时加入NO_WRITE_TO_BINLOG选项，不将此操作记入binlog

查询表碎片率

使用下面的SQL 可以查询表的锁片率，数据来源于统计信息，不是非常准确，仅供参考

select 
   table_name,engine,
   table_rows,
   round((data_length+index_length+DATA_FREE)/1024/1024/1024,2 ) size_GB,
   round(DATA_FREE/1024/1024/1024,2) free_GB 
 from information_schema.tables where table_schema='xxx' 
order by data_free;

执行过程

OPTIMIZE NO_WRITE_TO_BINLOG TABLE table_name;

NO_WRITE_TO_BINLOG 使此操作不计入binlog，也就不会同步到从库。这样可以避免此语句同步到从库之后，造成主从延时。主库执行完之后，再到从库执行同样的语句。

执行开始后，需要注意一下mysql会话执行情况；
使用如下sql查询，如果会话处在altering table 状态，则说明已经开始执行，这个阶段不会再锁表，可以正常进行dml和select操作。

select id, host, user, db, state, info, time from information_schema.processlist  
where info like "optimize%"\G

optimize table 治理mysql 高水位和空间碎片_表碎片

如果有会话会话处在 waiting lock 阶段，则说明 OPTIMIZE 被大事务阻塞。建议立刻停止此操作，杀掉大事务或换个时间再执行

select id, host, user, db, state, info, time from information_schema.processlist  
where info like "%batch_task_detail%"\G

optimize table 治理mysql 高水位和空间碎片_表碎片_02

执行结果

正常执行完之后会有如下提示信息

(root@localhost) [swltest:8801]> optimize table batch_task_detail;
+---------------------------+----------+----------+-------------------------------------------------------------------+
| Table                     | Op       | Msg_type | Msg_text                                                          |
+---------------------------+----------+----------+-------------------------------------------------------------------+
| swltest.batch_task_detail | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| swltest.batch_task_detail | optimize | status   | OK                                                                |
+---------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (13 min 41.83 sec)

如果执行失败，也会有失败的提示信息，比如：

(root@localhost) [swltest:8801]> optimize table batch_task_detail;
+---------------------------+----------+----------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table                     | Op       | Msg_type | Msg_text                                                                                                                             |
+---------------------------+----------+----------+--------------------------------------------------------------------------------------------------------------------------------------+
| swltest.batch_task_detail | optimize | note     | Table does not support optimize, doing recreate + analyze instead                                                                    |
| swltest.batch_task_detail | optimize | error    | Creating index 'FTS_DOC_ID_INDEX' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again. |
| swltest.batch_task_detail | optimize | status   | Operation failed                                                                                                                     |
+---------------------------+----------+----------+--------------------------------------------------------------------------------------------------------------------------------------+

时间参考

机械盘中， 55G大的表执行完成大概需要 13分钟。

标签：task,batch,detail,swltest,mysql,table,optimize
From： https://blog.51cto.com/u_13482808/6681890