################
一、主从不一致
误登录从库写入数据:
第一步在从库上停掉SQL线程:stop slave sql_thread;
磁盘空间不足:删除归档日志,保留30天,set global expire_logs_days=15; PURGE MASTER LOGS BEFORE '2020-03-15 00:00:00';purge binary logs before ‘2020-08-03 00:00:00’;清理废弃备份数据
告警metadata lock:添加索引
异地备份时超时:修改mysql参数
set global net_read_timeout = 1000;
set global net_write_timeout = 1000;
Threads_running 200:慢查询,全表扫描,并且频率高,扫描行数多,导致线程阻塞:alter table add index mytable(health_id);连接池不按规范使用;并发消费线程数配置较大;单台主机建议上限100;通知业务放增加对应索引
负载高:数仓pg数据库触发了慢查询,多并发,导致cpu负载高:ps aux|grep SELECT|awk -F ' ' '{print $2}'| xargs kill 后续还需要从sql层面做一些优化
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `orders_log` at row: 8303492》set global net_write_timeout = 1000;
mysql logical backup:[Error]: backup failed, please check the log file[error.log] 》》
max_execution_time =0
net_read_timeout =5500
net_write_timeout =1000
set global net_read_timeout=28800;
set global net_write_timeout=28800;
MySQL Slave Seconds Behind Master than 10》批量写入数据导致主从延迟》改成批量处理;多并发复制BUG》关闭多并发复制
》stop slave;
set global slave_parallel_workers=4;
start slave;
show slave status\G
bug》STOP SLAVE;
SET GLOBAL slave_parallel_workers=0;
START SLAVE;
》备份任务导致》调整备份任务
》从proxysql 的访问队列中去除该节点
磁盘使用率:由于select qa.answer_id from
opendata.question q
inner join
opendata.question_answer qa where q.disease like '%,%'
order by qa.gmt_modified desc limit 5
查询执行2小时导致临时文件高达560G》联系负责人,通知kill sql》重启》stop slave for channel '';
reset slave all for channel '';
MySQL SLAVE IO STOP 》数据库重复创建导致搜索聚合实例同步冲突》
stop slave;
set global sql_slave_skip_counter=1;
start slave for channel 'test_60_3308';
set global sql_slave_skip_counter=0;
start slave;
Free inodes is less than 20% on volume:sendmail 生成大量小文件占用了 inode》清理7天前的小文件,增加定时清理任务
SSH service is down》服务器因电源不稳定重启》手动添加VIP,消除告警》ifconfig bond0:1 10.20.200.220 netmask 255.255.255.0 up
redis. swap不足:redis实例碎片较多:save;save整理碎片,开启碎片自动整理
redis too many memory used:内存不足,可能存在大key或者没有设置过期》config set maxmemory 2G
################