1.hdfs报大量gc超时
namenode日志出现大量GC超时相关错误,且30914端口未监听:
GC pool 'ParNew' had collection(s): count=1 time=0ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=17577ms
解决办法:
修改namenode启动参数:-Xmx4G
改成-Xmx8G
,解决30914端口未监听的问题
2.hdfs-namenode启动报错zkfc与zookeeper断开连接
该问题可能与dns解析有关系 ,可尝试重启各个服务器对应的localnodedns ,然后删除zookeeper中与hdfs相关的znode,最后重启hdfs就可以了
重启nodelocaldns
kubectl -n kube-system rollout restart ds nodelocaldns && watch -n1 "kubectl -n kube-system get pod | grep nodelocal"
清理与hdfs相关的znode
kubectl -n component exec -it zookeeper-default-0 -- zkCli.sh delete /hadoop-ha/hdfs-k8s/ActiveStandbyElectorLock
kubectl -n component exec -it zookeeper-default-0 -- zkCli.sh delete /hadoop-ha/hdfs-k8s/ActiveBreadCrumb
kubectl -n component exec -it zookeeper-default-0 -- zkCli.sh delete /hadoop-ha/hdfs-k8s
kubectl -n component exec -it zookeeper-default-0 -- zkCli.sh delete /hadoop-ha
重启hdfs服务
kubectl delete -f /etc/kubernetes/hdfs/
kubectl apply -f /etc/kubernetes/hdfs/
标签:hdfs,kubectl,zookeeper,hadoop,ha,delete
From: https://www.cnblogs.com/zgjj/p/16746997.html