首页 > 编程语言 >spark任务报错java.io.IOException: Failed to send RPC xxxxxx to xxxx:xxx, but got no response. Marking as

spark任务报错java.io.IOException: Failed to send RPC xxxxxx to xxxx:xxx, but got no response. Marking as

时间:2023-01-14 23:11:11浏览次数:53  
标签:netty Marking java 4.0 xxx 43 报错 io Final

## 日志信息如下
``` Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48.172:59070, but got no response. Marking as slave lost. java.io.IOException: Failed to send RPC 9102760012410878153 to /192.168.48.172:59047: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) ~[spark-network-common_2.11-2.2.0.jar:2.2.0] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101] Caused by: java.nio.channels.ClosedChannelException at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[netty-all-4.0.43.Final.jar:4.0.43.Final] ```
## 现象
driver端显示日志内容为RPC通信错误,从而认为心跳超时,执行器被yarn杀掉,该问题有两种解决思路
1. driver或executor内存不足,GC时无法进行RPC通信从而心跳超时,定位方法 - driver端:查询driver的pid,jstat -gcutil pid查看内存使用情况,或jmap -heap pid查看内存使用 - executor端:查询executor的pid(可以从spark UI的执行器页面查看到执行器的ip和端口,通过ip和端口查询到executor所在的服务器和pid),根据pid查看内存使用情况 2. driver所在服务器与executor所在服务器之间的时间相差较多,相差1分钟以上就应该及时修改时间了,究其根本原因也很简单,两台服务器时间相差过大,造成本来就1ms内完成的通信,由于两个java进程计算的时间戳不同,造成driver认为响应超时,目前看大部分文章给的解决方式都是第一种,直接加executor内存,未必能解决问题,我们大部分集群都做了时钟同步,为什么还会造成时间相差很大呢,此时需要查看服务器是否开启了chronyd,如果你使用的是ntp,chronyd会对ntp有干扰,可以关闭chronyd
     关闭chronyd方法      ```      systemctl disable chronyd      systemctl stop chronyd      systemctl enable ntpd      systemctl start ntpd      ```   

标签:netty,Marking,java,4.0,xxx,43,报错,io,Final
From: https://www.cnblogs.com/wanghy-keepcoding/p/17052761.html

相关文章