我们在CDH集群运维过程中,偶尔会遇到机房突然断电的情况,我们来梳理下遇到机房断电之后的CDH集群恢复步骤,以免无从下手。
在调试CDH集群之前,肯定需要先重启好服务器,以及确保他们的网络畅通。
下面记录恢复CDH集群的步骤:
首先需要启动CDH的server服务,并查看状态
sudo /etc/init.d/cloudera-scm-server start
sudo /etc/init.d/cloudera-scm-server status
可能遇到的问题
cloudera-scm-server dead but pid file exists
解决方法
使用命令
service cloudera-scm-server stop
service cloudera-scm-server status
#删除cloudera-scm-server.pid
rm /var/run/cloudera-scm-server.pid
再次访问CDH控制后台,看看是否访问成功。
有一种情况是启动起来之后过几秒又挂了。
这种情况需要查看日志
tail -f -n 400 /var/log/cloudera-scm-server/cloudera-scm-server.log
可能遇到的问题
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:989)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:341)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2189)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2222)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2017)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:779)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:389)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:330)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:135)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
at com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:580)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:211)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:300)
... 20 more
Caused by: org.hibernate.exception.GenericJDBCException: Could not open connection
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:221)
at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:157)
at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:67)
at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:160)
at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1426)
at org.hibernate.ejb.TransactionImpl.begin(TransactionImpl.java:59)
... 28 more
Caused by: java.sql.SQLException: Connections could not be acquired from the underlying database!
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:106)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:529)
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:128)
at org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider.getConnection(C3P0ConnectionProvider.java:84)
at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:292)
at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:214)
... 33 more
Caused by: com.mchange.v2.resourcepool.CannotAcquireResourceException: A ResourcePool could not acquire a resource from its primary factory or source.
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1319)
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:557)
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:477)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:525)
... 37 more
解决方法
根据以上报错判断是连接不上依赖的数据库mysql,继续查看日志中的输出,连接的mysql如下:
2018-05-02 13:19:12,931 INFO main:org.hibernate.cfg.Configuration: HHH000221: Reading mappings from resource: cmf.hbm.xml
2018-05-02 13:19:13,163 INFO main:org.hibernate.service.jdbc.connections.internal.ConnectionProviderInitiator: HHH000130: Instantiating explicit connection provider: org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider
2018-05-02 13:19:13,164 INFO main:org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider: HHH010002: C3P0 using driver: com.mysql.jdbc.Driver at URL: jdbc:mysql://localhost/cdh?useUnicode=true&characterEncoding=UTF-8
2018-05-02 13:19:13,164 INFO main:org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider: HHH000046: Connection properties: {user=cdhuser, password=****, autocommit=true, release_mode=auto}
2018-05-02 13:19:13,164 INFO main:org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider: HHH000006: Autocommit mode: true
启动mysql
使用
service mysql start
或者
sudo service mysqld start
启动mysql
使用
netstat -ntlp | grep 3306
查找3306端口确认是否已经启动成功。
mysql启动成功,再启动scm-server成功。
然后每台节点都启动一下scm-agent服务。
sudo /etc/init.d/cloudera-scm-agent start
sudo /etc/init.d/cloudera-scm-agent status
这时我们就能通过CDH控制后台来操作了。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-T8F0ppJ3-1579230816707)(http://image.525.life/FuLf7asOMr0i4SY6lLnyB74_vEAw)]
重启后等待集群启动,如果有个别起不来的节点再查看日志分析解决。
最后需要启动一下配套的服务,比如presto或者hbase的thrift2等等。
presto
cd /home/zzq/presto/presto-server-0.195
bin/launcher start
另外,也可以在前台运行, 日志和相关输出将会写入stdout/stderr(可以使用类似daemontools的工具捕捉这两个数据流):
cd /home/zzq/presto/presto-server-0.195
bin/launcher run
presto提供了Web的管理界面,可以查看多节点的情况。
根据端口来访问,比如8080,则访问
http://192.168.30.252:8080/
thrift2
启动命令
nohup hbase thrift2 start &
参考链接:
hadoop基础----hadoop实战(七)-----hadoop管理工具—使用Cloudera Manager安装Hadoop—Cloudera Manager和CDH5.8离线安装
hadoop基础----hadoop实战(十一)-----hadoop管理工具—CDH的目录结构了解
数据仓库(十)—分布式SQL查询引擎—presto安装部署和连接hive使用
hadoop组件—面向列的开源数据库(三)—hbase的接口thrift
标签:jdbc,java,CDH,hadoop,---,hibernate,mysql,org,com From: https://blog.51cto.com/u_16218512/7007104