首页 > 系统相关 >哪个进程在侦听vxlan的udp socket

哪个进程在侦听vxlan的udp socket

时间:2025-01-06 21:11:13浏览次数:6  
标签:00 socket 0.0 udp dev permanent skb vxlan

intro

作为一个分布式虚拟化系统,网络在k8s中有重要意义。不同node上pod如何基于网络进行通讯是一个需要解决的基本/重要问题。在k8s的Networking and Network Policy中提到了常用的网络策略。其中的列表显然是按照字典序(而不是使用频率)排列,其中提到了比较常用的flannel模型,这个模型也是实践中比较常见的一个k8s网络模型。

Flannel is an overlay network provider that can be used with Kubernetes.

在flannel的主页中描述了flannel使用的主要技术VXLAN:

Flannel runs a small, single binary agent called flanneld on each host, and is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space. Flannel uses either the Kubernetes API or etcd directly to store the network configuration, the allocated subnets, and any auxiliary data (such as the host's public IP). Packets are forwarded using one of several backend mechanisms including VXLAN and various cloud integrations.

这个vxlan之前并没有听说过,所以按照常识首先可以确认下这个网络协议的报文格式:VXLAN packet format

  • 8-byte VXLAN header—VXLAN information for the frame.
    Flags—If the I bit is 1, the VXLAN ID is valid. If the I bit is 0, the VXLAN ID is invalid. All other bits are reserved and set to 0.
    24-bit VXLAN ID—Identifies the VXLAN of the frame. It is also called the virtual network identifier (VNI).
  • 8-byte outer UDP header for VXLAN—The default VXLAN destination UDP port number is 4789.
  • 20-byte outer IP header—Valid addresses of VTEPs or VXLAN multicast groups on the transport network. Devices in the transport network forward VXLAN packets based on the outer IP header.

这个报文格式有一个比较“奇特”的地方:在UDP协议内部直接包含了隧道(tunnel)中的数据,并没有任何一个类似于类型的字段说明这是一个vxlan报文。从网上查找资料可以看到:这个协议并不是在报文中通过特定的数值字段来表示它是一个vxlan报文,而是通过发送到特定UDP端口的数据都认为是vxlan报文。

The destination UDP port in the outer UDP header is specified in the VXLAN specification (Port 4789). This means it is a well-known service. So an UDP packet that arrives on Port 4789 is expected to be a VXLAN packet¹ in the same way that a TCP packet that arrives on Port 80 is expected to be a HTTP packet¹.

The draft you linked to is outdated and is missing this port number (although it mentions that the port number is to be obtained from IANA).

¹) When I talk about VXLAN/HTTP packets I mean of course the respective UDP/TCP packets with VXLAN/HTTP header/protocol inside.

看起来这个UDP端口是系统级别的,也就是内核需要感知这个UDP端口,进而需要有一个对应的socket实例。我们知道:通常socket都是用户态进程创建的,当有报文到达UDP的socket之后,此时需要唤醒侦听这个socket的进程。

那么问题终于来了:如果这个socket是内核创建的话,当有报文(packet)到达这个socket的时候,需要唤醒哪个进程来处理呢?

UDP socket

在内核中,当从一个udp socket中接收到数据时,首先会判断这个socket是否设置了encapsulation字段,如果有的话就不再走常规的socket报文接收、进程唤醒逻辑,而是调用注册的接收函数(encap_rcv)。

udp_rcv>>__udp4_lib_rcv>>udp_unicast_rcv_skb>>udp_queue_rcv_skb>>udp_queue_rcv_one_skb>>>>


/* returns:
 *  -1: error
 *   0: success
 *  >0: "udp encap" protocol resubmission
 *
 * Note that in the success and error cases, the skb is assumed to
 * have either been requeued or freed.
 */
static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
{
	int drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
	struct udp_sock *up = udp_sk(sk);
	int is_udplite = IS_UDPLITE(sk);

	/*
	 *	Charge it to the socket, dropping if the queue is full.
	 */
	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
		drop_reason = SKB_DROP_REASON_XFRM_POLICY;
		goto drop;
	}
	nf_reset_ct(skb);

	if (static_branch_unlikely(&udp_encap_needed_key) &&
	    READ_ONCE(up->encap_type)) {
		int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);

		/*
		 * This is an encapsulation socket so pass the skb to
		 * the socket's udp_encap_rcv() hook. Otherwise, just
		 * fall through and pass this up the UDP socket.
		 * up->encap_rcv() returns the following value:
		 * =0 if skb was successfully passed to the encap
		 *    handler or was discarded by it.
		 * >0 if skb should be passed on to UDP.
		 * <0 if skb should be resubmitted as proto -N
		 */

		/* if we're overly short, let UDP handle it */
		encap_rcv = READ_ONCE(up->encap_rcv);
		if (encap_rcv) {
			int ret;

			/* Verify checksum before giving to encap */
			if (udp_lib_checksum_complete(skb))
				goto csum_error;

			ret = encap_rcv(sk, skb);
			if (ret <= 0) {
				__UDP_INC_STATS(sock_net(sk),
						UDP_MIB_INDATAGRAMS,
						is_udplite);
				return -ret;
			}
		}

		/* FALLTHROUGH -- it's a UDP Packet */
	}

对应的,在udp的socket结构中定义了针对encapsulation类型socket的特定函数指针。

struct udp_sock {
///...
	/*
	 * For encapsulation sockets.
	 */
	int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
	void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, int err,
			      __be16 port, u32 info, u8 *payload);
	int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb);
	void (*encap_destroy)(struct sock *sk);
	///...
};

vxlan

注册

vxlan启动时,在创建的socket中注册了encap_rcv函数为vxlan_rcv函数。

/* Create new listen socket if needed */
static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6,
					      __be16 port, u32 flags,
					      int ifindex)
{
///...
	/* Mark socket as an encapsulation socket. */
	memset(&tunnel_cfg, 0, sizeof(tunnel_cfg));
	tunnel_cfg.sk_user_data = vs;
	tunnel_cfg.encap_type = 1;
	tunnel_cfg.encap_rcv = vxlan_rcv;
	tunnel_cfg.encap_err_lookup = vxlan_err_lookup;
	tunnel_cfg.encap_destroy = NULL;
///...
}

回调

对应的vxlan_rcv函数主体就是对封装(encapsulation)报文进行解包,然后调用gro_cells_receive函数。

/* Callback from net/ipv4/udp.c to receive packets */
static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
{
	struct vxlan_vni_node *vninode = NULL;
	struct vxlan_dev *vxlan;
	struct vxlan_sock *vs;
	struct vxlanhdr unparsed;
	struct vxlan_metadata _md;
	struct vxlan_metadata *md = &_md;
	__be16 protocol = htons(ETH_P_TEB);
	bool raw_proto = false;
	void *oiph;
	__be32 vni = 0;
	int nh;

	/* Need UDP and VXLAN header to be present */
	if (!pskb_may_pull(skb, VXLAN_HLEN))
		goto drop;

	unparsed = *vxlan_hdr(skb);
	/* VNI flag always required to be set */
	if (!(unparsed.vx_flags & VXLAN_HF_VNI)) {
		netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
			   ntohl(vxlan_hdr(skb)->vx_flags),
			   ntohl(vxlan_hdr(skb)->vx_vni));
		/* Return non vxlan pkt */
		goto drop;
	}
	unparsed.vx_flags &= ~VXLAN_HF_VNI;
	unparsed.vx_vni &= ~VXLAN_VNI_MASK;

	vs = rcu_dereference_sk_user_data(sk);
	if (!vs)
		goto drop;

	vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);

	vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, &vninode);
	if (!vxlan)
		goto drop;

	/* For backwards compatibility, only allow reserved fields to be
	 * used by VXLAN extensions if explicitly requested.
	 */
	if (vs->flags & VXLAN_F_GPE) {
		if (!vxlan_parse_gpe_proto(&unparsed, &protocol))
			goto drop;
		unparsed.vx_flags &= ~VXLAN_GPE_USED_BITS;
		raw_proto = true;
	}

	if (__iptunnel_pull_header(skb, VXLAN_HLEN, protocol, raw_proto,
				   !net_eq(vxlan->net, dev_net(vxlan->dev))))
		goto drop;

	if (vs->flags & VXLAN_F_REMCSUM_RX)
		if (unlikely(!vxlan_remcsum(&unparsed, skb, vs->flags)))
			goto drop;

	if (vxlan_collect_metadata(vs)) {
		IP_TUNNEL_DECLARE_FLAGS(flags) = { };
		struct metadata_dst *tun_dst;

		__set_bit(IP_TUNNEL_KEY_BIT, flags);
		tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), flags,
					 key32_to_tunnel_id(vni), sizeof(*md));

		if (!tun_dst)
			goto drop;

		md = ip_tunnel_info_opts(&tun_dst->u.tun_info);

		skb_dst_set(skb, (struct dst_entry *)tun_dst);
	} else {
		memset(md, 0, sizeof(*md));
	}

	if (vs->flags & VXLAN_F_GBP)
		vxlan_parse_gbp_hdr(&unparsed, skb, vs->flags, md);
	/* Note that GBP and GPE can never be active together. This is
	 * ensured in vxlan_dev_configure.
	 */

	if (unparsed.vx_flags || unparsed.vx_vni) {
		/* If there are any unprocessed flags remaining treat
		 * this as a malformed packet. This behavior diverges from
		 * VXLAN RFC (RFC7348) which stipulates that bits in reserved
		 * in reserved fields are to be ignored. The approach here
		 * maintains compatibility with previous stack code, and also
		 * is more robust and provides a little more security in
		 * adding extensions to VXLAN.
		 */
		goto drop;
	}

	if (!raw_proto) {
		if (!vxlan_set_mac(vxlan, vs, skb, vni))
			goto drop;
	} else {
		skb_reset_mac_header(skb);
		skb->dev = vxlan->dev;
		skb->pkt_type = PACKET_HOST;
	}

	/* Save offset of outer header relative to skb->head,
	 * because we are going to reset the network header to the inner header
	 * and might change skb->head.
	 */
	nh = skb_network_header(skb) - skb->head;

	skb_reset_network_header(skb);

	if (!pskb_inet_may_pull(skb)) {
		DEV_STATS_INC(vxlan->dev, rx_length_errors);
		DEV_STATS_INC(vxlan->dev, rx_errors);
		vxlan_vnifilter_count(vxlan, vni, vninode,
				      VXLAN_VNI_STATS_RX_ERRORS, 0);
		goto drop;
	}

	/* Get the outer header. */
	oiph = skb->head + nh;

	if (!vxlan_ecn_decapsulate(vs, oiph, skb)) {
		DEV_STATS_INC(vxlan->dev, rx_frame_errors);
		DEV_STATS_INC(vxlan->dev, rx_errors);
		vxlan_vnifilter_count(vxlan, vni, vninode,
				      VXLAN_VNI_STATS_RX_ERRORS, 0);
		goto drop;
	}

	rcu_read_lock();

	if (unlikely(!(vxlan->dev->flags & IFF_UP))) {
		rcu_read_unlock();
		dev_core_stats_rx_dropped_inc(vxlan->dev);
		vxlan_vnifilter_count(vxlan, vni, vninode,
				      VXLAN_VNI_STATS_RX_DROPS, 0);
		goto drop;
	}

	dev_sw_netstats_rx_add(vxlan->dev, skb->len);
	vxlan_vnifilter_count(vxlan, vni, vninode, VXLAN_VNI_STATS_RX, skb->len);
	gro_cells_receive(&vxlan->gro_cells, skb);

	rcu_read_unlock();

	return 0;

drop:
	/* Consume bad packet */
	kfree_skb(skb);
	return 0;
}

gro_cells_receive的主体逻辑是将报文放到一个napi_skbs队列的最后(__skb_queue_tail(&cell->napi_skbs, skb)),如果需要的话尝试进行报文的调度(napi_schedule(&cell->napi))。

struct gro_cell {
	struct sk_buff_head	napi_skbs;
	struct napi_struct	napi;
};
int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
{
	struct net_device *dev = skb->dev;
	struct gro_cell *cell;
	int res;

	rcu_read_lock();
	if (unlikely(!(dev->flags & IFF_UP)))
		goto drop;

	if (!gcells->cells || skb_cloned(skb) || netif_elide_gro(dev)) {
		res = netif_rx(skb);
		goto unlock;
	}

	cell = this_cpu_ptr(gcells->cells);

	if (skb_queue_len(&cell->napi_skbs) > READ_ONCE(net_hotdata.max_backlog)) {
drop:
		dev_core_stats_rx_dropped_inc(dev);
		kfree_skb(skb);
		res = NET_RX_DROP;
		goto unlock;
	}

	__skb_queue_tail(&cell->napi_skbs, skb);
	if (skb_queue_len(&cell->napi_skbs) == 1)
		napi_schedule(&cell->napi);

	res = NET_RX_SUCCESS;

unlock:
	rcu_read_unlock();
	return res;
}

至于内核中的napi机制这里就不再分析,如果感兴趣的话可以内核文档或者linuxfoundation文档

这里只需要知道:走到napi这个流程之后,这个packet和从物理网络设备接收到的报文走的是相同的流程了。

flanneld

当本地的数据到达flannel设备时,此时用户态运行的flanneld会根据自己从k8s中学习到的node ip和 cluster ip之间的对应关系,设置ip tunnel外部使用的(node)ip,从而最终完成虚拟网络到真实(node)网络之间的对接。

///@file: udp_network_amd64.go

func (n *network) Run(ctx context.Context) {
	defer func() {
		n.tun.Close()
		n.conn.Close()
		n.ctl.Close()
		n.ctl2.Close()
	}()

	// one for each goroutine below
	wg := sync.WaitGroup{}
	defer wg.Wait()

	wg.Add(1)
	go func() {
		runCProxy(n.tun, n.conn, n.ctl2, n.tunNet.IP, n.MTU())
		wg.Done()
	}()

从本地的路由信息中获得下一条的地址(也就是node的ip地址)。

///@file: proxy_amd64.c
static struct sockaddr_in *find_route(in_addr_t dst) {
	size_t i;

	for( i = 0; i < routes_cnt; i++ ) {
		if( contains(routes[i].dst, dst) ) {
			// packets for same dest tend to come in bursts. swap to front make it faster for subsequent ones
			if( i != 0 ) {
				struct route_entry tmp = routes[i];
				routes[i] = routes[0];
				routes[0] = tmp;
			}

			return &routes[0].next_hop;
		}
	}

	return NULL;
}

验证

在k8s的节点中查看系统打开的udp端口,可以发现linux小胡总vxlan默认侦听的8472端口并没有对应的进程(因为它是一个内核创建的socket)。

tsecer@harry: sudo netstat -ulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
udp        0      0 0.0.0.0:8472            0.0.0.0:*                           -                   
udp        0      0 127.0.0.54:53           0.0.0.0:*                           906/systemd-resolve 
udp        0      0 127.0.0.53:53           0.0.0.0:*                           906/systemd-resolve 
udp        0      0 0.0.0.0:36993           0.0.0.0:*                           907/systemd-timesyn 
tsecer@harry: 
  • master node
tsecer@harry: ip route
default via 172.16.0.1 dev eth0 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.2 
tsecer@harry: ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::1ccb:adff:fed6:37e8  prefixlen 64  scopeid 0x20<link>
        ether 1e:cb:ad:d6:37:e8  txqueuelen 1000  (Ethernet)
        RX packets 545  bytes 47161 (47.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 550  bytes 78600 (78.6 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.2  netmask 255.255.0.0  broadcast 172.16.255.255
        inet6 fe80::5815:f0ff:fe6b:7402  prefixlen 64  scopeid 0x20<link>
        ether 76:52:14:85:92:e5  txqueuelen 1000  (Ethernet)
        RX packets 11738  bytes 39007251 (39.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8660  bytes 1873516 (1.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::c83:dfff:fe06:e796  prefixlen 64  scopeid 0x20<link>
        ether 0e:83:df:06:e7:96  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 12 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 64416  bytes 19403001 (19.4 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 64416  bytes 19403001 (19.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth32294f1c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::286d:e5ff:fe73:d371  prefixlen 64  scopeid 0x20<link>
        ether 2a:6d:e5:73:d3:71  txqueuelen 1000  (Ethernet)
        RX packets 270  bytes 27128 (27.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 296  bytes 39663 (39.6 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth4f50c6e6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::601a:ebff:fe60:806b  prefixlen 64  scopeid 0x20<link>
        ether 62:1a:eb:60:80:6b  txqueuelen 1000  (Ethernet)
        RX packets 277  bytes 27747 (27.7 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 304  bytes 42785 (42.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tsecer@harry: ip -d link show flannel.1
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether 0e:83:df:06:e7:96 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    vxlan id 1 local 172.16.0.2 dev eth0 srcport 0 0 dstport 8472 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
tsecer@harry: bridge fdb show
33:33:00:00:00:01 dev bond0 self permanent
33:33:00:00:00:01 dev dummy0 self permanent
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:6b:74:02 dev eth0 self permanent
e2:4c:d4:4e:7e:63 dev flannel.1 dst 172.16.0.3 self permanent
3a:ae:e8:eb:08:4d dev flannel.1 dst 172.16.0.4 self permanent
33:33:00:00:00:01 dev cni0 self permanent
01:00:5e:00:00:6a dev cni0 self permanent
33:33:00:00:00:6a dev cni0 self permanent
01:00:5e:00:00:01 dev cni0 self permanent
33:33:ff:d6:37:e8 dev cni0 self permanent
1e:cb:ad:d6:37:e8 dev cni0 vlan 1 master cni0 permanent
1e:cb:ad:d6:37:e8 dev cni0 master cni0 permanent
72:e5:28:b2:78:a7 dev veth4f50c6e6 master cni0 
62:1a:eb:60:80:6b dev veth4f50c6e6 vlan 1 master cni0 permanent
62:1a:eb:60:80:6b dev veth4f50c6e6 master cni0 permanent
33:33:00:00:00:01 dev veth4f50c6e6 self permanent
01:00:5e:00:00:01 dev veth4f50c6e6 self permanent
33:33:ff:60:80:6b dev veth4f50c6e6 self permanent
22:c5:ff:88:71:85 dev veth32294f1c master cni0 
2a:6d:e5:73:d3:71 dev veth32294f1c vlan 1 master cni0 permanent
2a:6d:e5:73:d3:71 dev veth32294f1c master cni0 permanent
33:33:00:00:00:01 dev veth32294f1c self permanent
01:00:5e:00:00:01 dev veth32294f1c self permanent
33:33:ff:73:d3:71 dev veth32294f1c self permanent
tsecer@harry: 
  • node1
laborant@node-01:~$ PS1="tsecer@node1: "
tsecer@node1: ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.3  netmask 255.255.0.0  broadcast 172.16.255.255
        inet6 fe80::d8d1:bcff:fe7c:ea17  prefixlen 64  scopeid 0x20<link>
        ether 46:4f:70:5f:9a:23  txqueuelen 1000  (Ethernet)
        RX packets 8623  bytes 38842461 (38.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5142  bytes 610762 (610.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.1.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::e04c:d4ff:fe4e:7e63  prefixlen 64  scopeid 0x20<link>
        ether e2:4c:d4:4e:7e:63  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 12 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 12  bytes 890 (890.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 890 (890.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tsecer@node1: ip route
default via 172.16.0.1 dev eth0 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.3 
tsecer@node1: ip -d link show flannel.1
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether e2:4c:d4:4e:7e:63 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    vxlan id 1 local 172.16.0.3 dev eth0 srcport 0 0 dstport 8472 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
tsecer@node1: bridge fdb show
33:33:00:00:00:01 dev bond0 self permanent
33:33:00:00:00:01 dev dummy0 self permanent
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:7c:ea:17 dev eth0 self permanent
3a:ae:e8:eb:08:4d dev flannel.1 dst 172.16.0.4 self permanent
0e:83:df:06:e7:96 dev flannel.1 dst 172.16.0.2 self permanent
tsecer@node1: 
  • node2
default via 172.16.0.1 dev eth0 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.4 
tsecer@node2: ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.4  netmask 255.255.0.0  broadcast 172.16.255.255
        inet6 fe80::828:47ff:fe7f:c3ba  prefixlen 64  scopeid 0x20<link>
        ether e6:6d:8d:55:fb:89  txqueuelen 1000  (Ethernet)
        RX packets 8881  bytes 38925854 (38.9 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5978  bytes 719168 (719.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.2.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::38ae:e8ff:feeb:84d  prefixlen 64  scopeid 0x20<link>
        ether 3a:ae:e8:eb:08:4d  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 12 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 12  bytes 890 (890.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 890 (890.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tsecer@node2: ip -d link show flannel.1
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether 3a:ae:e8:eb:08:4d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 
    vxlan id 1 local 172.16.0.4 dev eth0 srcport 0 0 dstport 8472 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
tsecer@node2: bridge fdb show
33:33:00:00:00:01 dev bond0 self permanent
33:33:00:00:00:01 dev dummy0 self permanent
33:33:00:00:00:01 dev eth0 self permanent
01:00:5e:00:00:01 dev eth0 self permanent
33:33:ff:7f:c3:ba dev eth0 self permanent
e2:4c:d4:4e:7e:63 dev flannel.1 dst 172.16.0.3 self permanent
0e:83:df:06:e7:96 dev flannel.1 dst 172.16.0.2 self permanent
tsecer@node2: 

outro

至此就可以回答最开始提出的问题了:当内核的这个UDP socket收到报文的时候,它并没有走常规的报文追加到socket接收队列/唤醒进程的流程。而是通过这个socket结构中设置的回调函数(encap_rcv)来直接调用函数(而不涉及进程)。

vxlan在创建这个socket的时候注册的回调函数为vxlan_rcv,这个函数对报文进行解包,然后追加到内核的napi框架中。这个napi框架再次对这个包进行解包并进行路由等一系列(和从eth网卡相同的)接收报文处理逻辑。此后,内核网络协议栈看到的解包后报文就是走出隧道(tunnel)的发送方承载报文了。

标签:00,socket,0.0,udp,dev,permanent,skb,vxlan
From: https://www.cnblogs.com/tsecer/p/18656299

相关文章

  • 关于java实现TLS socket的X509自签名证书的一次记录
    创建客户端和服务器端的证书文件通常涉及几个步骤,包括生成私钥、创建证书签名请求(CSR)、签发证书以及将这些信息打包到PKCS#12格式的文件中。以下是详细的步骤说明:1.安装OpenSSL首先,你需要安装OpenSSL,它是一个开源的SSL/TLS工具包,可以用来生成密钥对和证书。Windows:可以从......
  • sse和websocket有什么区别?
    SSE(Server-SentEvents)和WebSocket在前端开发中都扮演着实现实时通信的重要角色,但它们之间存在着明显的区别。以下是对两者区别的详细解析:一、通信方式SSE:SSE是基于HTTP协议的,它建立的是单向通道,只允许服务器向浏览器发送数据。这意味着客户端(浏览器)可以接收服务器的实时更新,但......
  • 12.30 java网络编程之socket编程(NIO多路复用版本) socket编程大作业答案
    在本次项目中,我们将实现一个简单的客户端-服务器(Client-Server)通信模型。通过这个项目,你将学习到如何使用Java的SocketCh和ServerSocket类来创建网络连接,进行数据的发送和接收。该项目不仅涵盖了Socket编程的基础知识,还将帮助你理解网络通信中的重要概念,如TCP/IP协议、阻塞......
  • Jmeter 进行websocket接口测试
    什么是websocket协议?Websocket是基于tcp的一种全双通信协议,客户端与服务器之间通过websocket建立连接后,客户端和服务器之间会长时间保持连接状态(即长连接)。客户端可以向服务器发送数据,服务器也可以主动向客户端推送数据。与http协议不同的是http是tcp的单向通信协议,只有客户端向......
  • 发布blazor应用到Linux, 使用nginx作为WebSocket代理
    Blazor使用了SignalR连接,而SignalR使用的是WebSocketWebSocket协议提供了一种创建支持客户端和服务器之间实时双向通信的Web应用程序的方法。作为HTML5的一部分,WebSocket使开发此类应用程序比以前的方法容易得多。大多数现代浏览器都支持WebSocket,包括Chrome、Firefox......
  • UDP层协议解读及UDP传输的代码的实现
    UDP数据包结构组成关于数据字段的长度解释在标准IPV4协议中,以太网帧的最大MTU是1500字节,这指的是IP数据包的最大大小。这个大小包括了IP头部、UDP头部和UDP数据部分,IPV4标准下,IP头部长度为20字节,UDP头部全为8字节,故使用标准的IPV4和UDP协议,UDP报文的数据部分最大长度为1......
  • UdpNm (UDP Network Management)
    IntroductionArchitectureOverviewTheAUTOSARNetworkManagementconsistsofthegeneralNMInterfaceandthebus-specificNMmodules.TheUDPNetworkManagement(UdpNm)moduleimplementsthenetworkmanagementfunctionalityfortheEthernet.Networkman......
  • Spring Boot 3 新特性 @RSocketExchange:轻松实现实时消息推送的全方位指南
    1.引言1.1SpringBoot3的背景及新特性概述SpringBoot作为Java开发中最流行的微服务框架之一,其每次大版本更新都带来了显著的技术改进和开发者体验优化。SpringBoot3引入了一系列新特性,特别是在性能优化、支持现代化协议以及对SpringFramework6的全面整合方......
  • SocketTask.onOpen
    SocketTask.onOpen(functionlistener)小程序插件:支持相关文档:网络使用说明、局域网通信功能描述监听WebSocket连接打开事件参数functionlistenerWebSocket连接打开事件的监听函数参数Objectres属性类型说明最低版本headerobject连接成功的H......
  • SocketTask.send
    SocketTask.send(Objectobject)以Promise风格调用:不支持小程序插件:支持相关文档:网络使用说明、局域网通信功能描述通过WebSocket连接发送数据参数Objectobject属性类型默认值必填说明datastring/ArrayBuffer是需要发送的内容successfunc......