今天早上发现如下task微服务假死了没有响应,然后发现如下日志
2024-03-29 01:06:17.089 [Thread-152587] ERROR com.alibaba.druid.pool.DruidPooledStatement.errorCheck [370] : CommunicationsException, druid version 1.2.11, jdbcUrl : jdbc:mysql://15.99.72.154:3306/nova?allowMultiQueries=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&useSSL=FALSE, testWhileIdle true, idle millis 279301, minIdle 50, poolingCount 4, timeBetweenEvictionRunsMillis 60000, lastValidIdleMillis 279301, driver com.mysql.jdbc.Driver, exceptionSorter com.alibaba.druid.pool.vendor.MySqlExceptionSorter
2024-03-29 01:07:03.957 [Thread-152587] ERROR com.alibaba.druid.pool.DruidDataSource.handleFatalError [1867] : {conn-10396} discard
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 279,208 milliseconds ago. The last packet sent successfully to the server was 279,221 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
从监控日志发现凌晨1点多的时候154的cpu非常高,把mysql搞崩了
后来继续排查日志发现凌晨1点多的时候有一个定时任务开始启动了,然后里面有个逻辑会执行如下sql语句
SELECT a.configitem_id as configlv1,a.content as contentlv1,b.configitem_id as configlv2,b.content as contentlv2,
c.configitem_id as configlv3,c.content as contentlv3,d.configitem_id as configlv4,d.content as contentlv4,
d.flag as flaglv4,e.configitem_id as configlv5,e.content as contentlv5,e.flag as flaglv5,e.PN,e.pulsar_id FROM m_configitem a
LEFT JOIN m_configitem b ON a.configitem_id = b.parent_item
LEFT JOIN m_configitem c ON b.configitem_id = c.parent_item
LEFT JOIN m_configitem d ON c.configitem_id = d.parent_item
LEFT JOIN m_configitem e ON d.configitem_id = e.parent_item
WHERE
a.`level` = 0
通过task微服务的日志发现1秒钟内有多个thread同时在执行上面这个sql(因为调用上面这个sql的方法被弄成了异步)
然后用explain发现这个语句会全表扫描,所以破案了
解决方法:
1,先把凌晨一点调用这个sql的方法改成同步方法,减少并发执行这个sql的数量
2,优化表,使得这个表不会进行全表扫描
标签:java,假死,cpu,content,configitem,mysql,com,id From: https://www.cnblogs.com/xiaohanlin/p/18104157