spark RPC超时造成任务异常 Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48

时间：2023-01-14 23:25:57浏览次数：54

标签：netty 48.172 java 4.0 43 RPC executor Final

日志信息如下

Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48.172:59070, but got no response. Marking as slave lost.
java.io.IOException: Failed to send RPC 9102760012410878153 to /192.168.48.172:59047: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237) ~[spark-network-common_2.11-2.2.0.jar:2.2.0]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[netty-all-4.0.43.Final.jar:4.0.43.Final]

现象

driver端显示日志内容为RPC通信错误，从而认为心跳超时，执行器被yarn杀掉，该问题有两种解决思路

driver或executor内存不足，GC时无法进行RPC通信从而心跳超时，定位方法

driver端：查询driver的pid，jstat -gcutil pid查看内存使用情况，或jmap -heap pid查看内存使用
executor端：查询executor的pid（可以从spark UI的执行器页面查看到执行器的ip和端口，通过ip和端口查询到executor所在的服务器和pid），根据pid查看内存使用情况

driver所在服务器与executor所在服务器之间的时间相差较多，相差1分钟以上就应该及时修改时间了，究其根本原因也很简单，两台服务器时间相差过大，造成本来就1ms内完成的通信，由于两个java进程计算的时间戳不同，造成driver认为响应超时，目前看大部分文章给的解决方式都是第一种，直接加executor内存，未必能解决问题，我们大部分集群都做了时钟同步，为什么还会造成时间相差很大呢，此时需要查看服务器是否开启了chronyd，如果你使用的是ntp，chronyd会对ntp有干扰，可以关闭chronyd

关闭chronyd方法
```
systemctl disable chronyd
systemctl stop chronyd
systemctl enable ntpd
systemctl start ntpd
```

标签：netty,48.172,java,4.0,43,RPC,executor,Final
From： https://www.cnblogs.com/wanghy-keepcoding/p/17052771.html

spark任务报错java.io.IOException: Failed to send RPC xxxxxx to xxxx:xxx, but got
##日志信息如下```Attemptedtogetexecutorlossreasonforexecutorid17atRPCaddress192.168.48.172:59070,butgotnoresponse.Markingasslavelost.......
RPC接口测试技术-websocket 自动化测试实践
WebSocket是一种在单个TCP连接上进行全双工通信(FullDuplex是通讯传输的一个术语。通信允许数据在两个方向上同时传输，它在能力上相当于两个单工通信方式的结合。全双工......
性能测试|Rpc接口压测
现今有比较多的rpc框架应用于实际的生产中，像比较流行的Dubbo、Motan、Thrift、Grpc等。今天作者将以最近项目中用到的grpc为例，结合jmeter来介绍下rpc压测实施步骤。学习本......
线程池使用ExecutorService 多线程处理队列任务
最近转到银行工作，在做最核心的财务账务部分，对我来说是一个比较新的东西，工作也已经四年有余，接触一些新的东西，也是不错，每天也累得像狗...不说了。/捂脸接下来说一种非常实用的......
配置正确但是 Aria2 RPC 服务器错误解决方案
适用于Windows系统（Linux/MacOS也不会遇到这些问题吧……）检查hosts文件中是否为127.0.0.1设置别名localhost（可将Aria2JSON-RPC地址的localhost改为127.0.0......
JUC源码学习笔记5——1.5w字和你一起刨析线程池ThreadPoolExecutor源码，全网最细doge
源码基于JDK8文章1.5w字，非常硬核系列文章目录和关于我一丶从多鱼外卖开始话说，王多鱼给好友胖子钱让其投资，希望亏得血本无归。胖子开了一个外卖店卖国宴，主打高端，外卖......
Error querying database. Cause: org.apache.ibatis.executor.ExecutorException: E
报错截图：产生原因：把sqlsession定义为了成员变量解决方法：将sqlsession定义在方法内 ......
230110_50_RPC底层原理
最终版本，利用hessian实现rpc调用HessianUtilpackagecom.bill.rpc10;importcom.caucho.hessian.io.Hessian2Input;importcom.caucho.hessian.io.Hessian2Output;......
ListeningExecutorService的使用
由于普通的线程池，返回的Future，功能比较单一；Guava定义了ListenableFuture接口并继承了JDKconcurrent包下的Future接口，ListenableFuture允许你注册回调方法(callbacks)，在......
gRPC入门与实操(.NET篇)
为什么选择gRPC历史长久以来，我们在前后端交互时使用WebApi+JSON方式，后端服务之间调用同样如此（或者更久远之前的WCF+XML方式）。WebApi+JSON是优选的，很重要的一点是......

spark RPC超时造成任务异常 Attempted to get executor loss reason for executor id 17 at RPC address 192.168.48

日志信息如下

现象

相关文章

赞助商

阅读排行