首页 > 其他分享 >Hadoop问题解决记(1)

Hadoop问题解决记(1)

时间:2023-10-04 12:44:06浏览次数:40  
标签:ipc java Hadoop hadoop 问题 apache 60020 解决 org

最近在测试HBase时遇到一个非常奇怪的问题:集群有7台机器,其中1台Master,6台RegionServer。但是Master只能控制其中1台RegionServer,而无法控制其他5台RegionServer。

打开master的日志文件,发现以下错误信息:

2011-04-22 16:37:21,242 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to serverName=hp2,60020,1303461559353, load=(requests=0, regions=0, usedHeap=28, maxHeap=3979), trying to assign elsewhere instead; retry=0
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /10.131.18.3:60020 after attempts=1
    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:965)
    at org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
    at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:920)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
    at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
    at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1189)
    at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:432)
    at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:389)
    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
    at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
    at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
    at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
    at $Proxy7.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
    at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
    ... 10 more

在这个日志中,master机器无法与IP地址为10.131.18.3的regionserver进行通信。

然后找到10.131.18.3机器,查看这台机器的regionserver日志,查看regionserver的启动信息:

2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 10 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 11 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 12 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 13 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 14 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 15 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 16 on 60020: starting
2011-04-14 18:32:05,122 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 17 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 18 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 19 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 0 on 60020: starting2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 1 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 2 on 60020: starting2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 3 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 4 on 60020: starting2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 5 on 60020: starting
2011-04-14 18:32:05,123 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 6 on 60020: starting2011-04-14 18:32:05,124 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 7 on 60020: starting
2011-04-14 18:32:05,124 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 8 on 60020: starting2011-04-14 18:32:05,124 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on 60020: starting2011-04-14 18:32:05,124 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as dell4,60020,1302777124101, RPC listening on /127.0.0.1:60020, sessionid=0x12f535856620004

可以看出,这台regionserver机器启动成功了,但是RPC的监听ip地址却是本机的地址(127.0.0.1)。这样的话,master机器就无法与这台regionserver正常通信了,正确的监听地址应该是10.131.18.3才对。

查看代码,RPC监听地址的代码如下:

/** @return Bind address */
  public String getBindAddress() {
    final InetAddress addr = address.getAddress();
    if (addr != null) {
      return addr.getHostAddress();
    } else {
      LogFactory. getLog(HServerAddress.class).error( "Could not resolve the"
          + " DNS name of " + stringValue );
      return null;
    }
  }

代码没有错,看来是机器的某些配置导致java读取本机的ip地址出现了错误。最后查看这台机器的hosts文件:

[hadoop@hp2 logs]$ vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               hp2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.131.18.8     dell1
10.131.18.5     dell2
10.131.18.6     dell3
10.131.18.7     dell4
10.131.18.2     hp1
10.131.18.3     hp2
10.131.18.4     hp3

问题找到了,其实是hosts文件的配置原因,接下来修改hosts文件为如下:

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.131.18.8     dell1
10.131.18.5     dell2
10.131.18.6     dell3
10.131.18.7     dell4
10.131.18.2     hp1
10.131.18.3     hp2
10.131.18.4     hp3

再次启动整个集群,问题解决。

标签:ipc,java,Hadoop,hadoop,问题,apache,60020,解决,org
From: https://www.cnblogs.com/zjsdbk/p/17742146.html

相关文章

  • 解决警告UserWarning: Glyph 38388 (\N{CJK UNIFIED IDEOGRAPH-95F4}) missing from
    这个警告是由于在绘图时使用了当前字体不支持的字符,通常出现在使用非英文字符(比如中文、日文等)时。为了解决这个问题,你可以尝试以下几种方法:方法一:选择支持中文的字体在绘图之前,指定一个支持中文的字体。例如,可以使用matplotlib.rcParams来指定字体,示例如下:importmatplotlib.pyplo......
  • 445端口被屏蔽的解决办法(已测试)
         为了节省大家宝贵的时间,特收集了一些解决屏蔽445端口的方法,网上的方法很多,对于一些像我一样的小白来说,还真有点不知道具体如何操作,看了很多大神的解决方法后,于是总结了一下具体的操作流程,用以方便像我一样的小白,期望达到小白共勉的目的!1、原因说明:前两年勒索病毒WannaCr......
  • Edge浏览器解决“你的连接不是专用连接提示”
    调整键盘为英文输入状态,刷新一下页面,鼠标点击当前页面任意位置,然后依次按键:thisisunsafe按完上面的按键,页面会自动刷新,然后就可以正常访问了。这是一个Chromium内置的后门,特地写成thisisunsafe就是让使用者不要滥用这个功能(之前叫`badidea`),输入之后就会忽略证书错误。......
  • 踩过的坑size_t类型下标遍历问题
    踩过的坑size_t类型下标逆序遍历通过下标逆序遍历以下代码是没有问题的strings;cin>>s;for(inti=s.length()-1;i>=0;--i)cout<<s[i];但是如果用无符号类型作为循环变量的类型会出现死循环的问题,因为i>=0永远是成立的strings;cin>>s;......
  • 数组动态创建问题
    数组动态创建问题C++较新版本中允许通过变量方式动态创建数组intn;cin>>n;inta[n]={0};但有些ide会提示"表达式必须含有常量值c/c++"问题,可用一下方式消除此问题intn;cin>>n;inta*=newint[n];......
  • 解决交叉编译产生的程序放到目标板上运行时出现Segmentation fault (core dumped)
    原文:https://blog.csdn.net/qq_36219010/article/details/100163134在PC机上编译一段程序:arm-linux-gnueabihf-gcc-ohellohello.c这里产生hello文件用FTP传输到目标板上(树莓派3B+),运行时出现:报告段错误。后来我又试了不输出指定的文件名:arm-linux-gnueabihf-gcchello.c......
  • python3 番外篇之pyenv安装python遇到的ssl问题
    最近在学爬虫,在Linux中通过pyenv安装3.9.10,安装时也没有问题,问题就出在安装完requests模块后,引用requests模块报错。(reptile)root@localhost:/data/reptile#pythonPython3.9.10(main,Aug102023,01:32:05)[GCC7.3.0]onlinuxType"help","copyright","credits"......
  • TP5环境静态文件报404的解决方案
    主要还是站点配置文件,找到vhost下的站点配置文件,代码如下server{listen80;server_namewww.test.comtest.com;indexindex.htmlindex.htmindex.php;#include/usr/local/nginx/conf/rewrite/none.conf;root/home/wwwroot/myproject888/;#根目录路径#......
  • 视频监控/监控汇聚平台EasyCVR解决方案,让智能监管更具穿透力
    安防视频监控平台EasyCVR是一个具有强大拓展性、灵活的视频能力和轻便部署的平台。它支持多种主流标准协议,包括国标GB28181、RTSP/Onvif、RTMP等,还可以支持厂家的私有协议和SDK接入,例如海康Ehome、海大宇等设备的SDK。该平台不仅拥有传统安防视频监控的功能,还具备接入AI智能分析的......
  • 使用 Stable Diffusion 本地版时遇到显卡驱动过旧的问题
    我本地安装了一个StableDiffusion,使用它生成图片时,遇到了如下错误消息:BC:\WINDOWS\systvenv"C:\app\stable-diffusion-webui-master\venv\Scripts\Python.exe"Python3.10.8(tags/v3.10.8:aaaf517,Oct112022,16:50:30)[MScv.193364bit(AMD64)]Commithash:Trac......