首页 > 其他分享 >calico节点重启4分钟后跨节点流量才通

calico节点重启4分钟后跨节点流量才通

时间：2024-10-05 23:33:37浏览次数：14

标签：proto 重启 calico bgp bird graceful 节点 restart

bird v0.3.3

问题现象

针对calico bgp peer+ipip模式，单个节点重启，等待4分钟以上，pod跨节点流量才通。

问题分析

k8s节点重启 -> bird进程加载bird配置文件，进入graceful restart流程和wait状态

每次尝试建立bgp peer连接时，graceful_restart_locks++。
nest/proto.c
proto_graceful_restart_lock函数

proto/bgp/bgp.c
bgp_start函数

成功建立连接时，graceful_restart_locks--。
发现变成0后，停止timer，触发graceful_restart_done。
nest/proto.c
proto_graceful_restart_unlock函数

proto/bgp/bgp.c
bgp_conn_enter_established_state函数

正常情况下，bird状态变更是，initializing -> starting -> start -> wait -> feed -> up。之后，增加其他节点子网路由到本机节点上，指向tunl0。
nest/proto.c
proto_notify_state函数
bgp状态变化时，回调proto_notify_state，更新状态并输出日志，例如bgp_conn_enter_established_state函数。

异常情况下，bird无法与其他节点bird建立bgp连接，等待240s超时时间后增加路由。
nest/protocol.h
默认graceful_restart超时时间是240s，没有参数化配置。

conf/conf.c
config_alloc函数
初始化配置

nest/proto.c
graceful_restart_init函数
hook点是graceful_restart_done，超时时间是240s。

nest/proto.c
graceful_restart_done函数
输出graceful restart done日志，遍历所有bgp连接，输出状态变成up日志，graceful_restart_locks设成0。

解决问题

人为重启节点前，确保其他节点bird进程正常；环境增加监控，确保正常。

标签：proto,重启,calico,bgp,bird,graceful,节点,restart
From： https://www.cnblogs.com/WJQ2017/p/18448726

相关文章

单Master节点的k8s集群部署-完整版
K8S安装步骤一、准备工作1.准备三台主机（一台Master节点，两台Node节点）如下：角色IP内存核心磁盘Master192.168.116.1314G4个55GNode01192.168.116.1324G4个55GNode02192.168.116.1334G4个55G2.关闭SElinux，因为SElinux会影响K8S部分组件无法正常......
java-netty客户端断线重启
背景经常会遇到netty客户端，因为网络等多种原因而断线，需要自动重连核心就是对连接服务端成功后，对ChannelFuture进行监听，核心代码如下f=b.connect("127.0.0.1",10004).sync();//(5)f.addListener(newChannelFutureListener(){......

赞助商

阅读排行