Overlay 网络的基本介绍
Overlay 网络的基本架构组成 Overlay 网络技术是指在传统网络架构之上叠加的虚拟化技术模式 。也就是说它是依托于传统网络架构的前提条件下, 实现了应用与其虚拟网络的捆绑而忽略底层物理网络的传输模式及技术 。简而言之,就是overlay在L2(数据链路层)网络互通的前提下, 基于L3(网络层)实现了虚拟化网络。在该层网络环境下,可以为不同容器分配固定的局域网ip,并通过该ip进行通讯交互。 网络名词介绍 网络命名空间(namespace):Linux在网络栈中引入网络命名空间,将独立的网络协议栈隔离到不同的命令空间中,彼此间无法通信;docker利用这一特性,实现不容器间的网络隔离。 Veth设备对:Veth设备对的引入是为了实现在不同网络命名空间的通信。 Iptables/Netfilter:Netfilter负责在内核中执行各种挂接的规则(过滤、修改、丢弃等),运行在内核模式中;Iptables模式是在用户模式下运行的进程,负责协助维护内核中Netfilter的各种规则表;通过二者的配合来实现整个Linux网络协议栈中灵活的数据包处理机制。 网桥:网桥是一个二层网络设备,通过网桥可以将linux支持的不同的端口连接起来,并实现类似交换机那样的多对多的通信。 路由:Linux系统包含一个完整的路由功能,当IP层在处理数据发送或转发的时候,会使用路由表来决定发往哪里。 网络类型 1.Overlay Network:管理 Swarm 中 Docker 守护进程间的通信。你可以将服务附加到一个或多个已存在的 overlay 网络上,使得服务与服务之间能够通信; 2.Ingress Network:一个特殊的 overlay 网络,用于服务节点间的负载均衡。当任何 Swarm 节点在发布的端口上接收到请求时,它将该请求交给一个名为 IPVS 的模块。 IPVS 跟踪参与该服务的所有IP地址,选择其中的一个,并通过 ingress 网络将请求路由到它。初始化或加入 Swarm 集群时会自动创建 ingress 网络, 大多数情况下,用户不需要自定义配置,但是 docker 17.05 和更高版本允许你自定义; 3.Gwbridge_Network:一种桥接网络,将 overlay 网络连接到一个单独的 Docker 守护进程的物理网络。默认情况下, 服务正在运行的每个容器都连接到本地 Docker 守护进程主机的 docker_gwbridge 网络,一种桥接网络,将 overlay 网络(包括 ingress 网络) 连接到一个单独的 Docker 守护进程的物理网络。默认情况下,服务正在运行的每个容器都连接到本地 Docker 守护进程主机的 docker_gwbridge 网络; docker_gwbridge和ingress是swarm自动创建的,当用户执行了docker swarm init之后。 docker_gwbridge是bridge类型的,负责本机container和宿主机直接的连接。 ingress是overlay类型的,负责service在多个主机container之间的路由。这个网络用于将服务暴露给外部访问,将外部请求路由到不同主机的容器。 ingress-sbox容器:其实并不是一个容器,而是一个网络的命名空间(network namespace),server的VIP地址保存在这里,使用nsenter --net=/run/docker/netns/ingress-sbox sh进入命名空间。 每一个集群节点均有ingress-sbox容器,并都有VIP地址。 当在任何一个Swarm节点去访问端口服务的时候会通过本节点的IPVS ( ip virtual service )到真正的Swarm节点上(说明每个节点都有ingress-sbox容器,并都有VIP地址)。提供以下三种功能: 1.外部访问的均衡负载;2.服务端口暴露到各个Swarm节点; 3.内部通过IPVS进行均衡负载; Internal容器与容器之间通过overlay网络进行访问,通过service name进行通信,但是service name所对应的ip不是真实ip而是VIP 当在任何一个Swarm节点去访问端口服务的时候会通过本节点的IPVS ( ip virtual service )到真正的Swarm节点上(说明每个节点都有ingress-sbox容器,并都有VIP地址)。提供以下三种功能: 1.外部访问的均衡负载;2.服务端口暴露到各个Swarm节点; 3.内部通过IPVS进行均衡负载; Swarm节点内部是如何进行转发的 当我们访问任一节点的80端口时,只要我们这个节点处于Swarm集群中,不管服务是否部署到这个节点都能访问,只要端口相同即可。 我们本地的请求会被转发到Ingress_sbox这个Network Namespace中,在这个名称空间中再通过lvs转发到具体服务容器的ip和80端口中去。 对于理解swarm的网络来讲,最重要的两个点: 第一:外部如何访问部署运行在swarm集群内的服务? 可以称之为 入方向流量,在swarm里我们通过ingress来解决。 第二:部署在swarm集群里的服务,如何对外进行访问? 这部分又分为两块: 东西向流量,也就是不同swarm节点上的容器之间如何通信,swarm通过 overlay网络来解决; 南北向流量 ,也就是swarm集群里的容器如何对外访问,比如互联网,这个是 Linux bridge + iptables NAT来解决的 例: 添加网络和创建容器 [root@localhost7D ~]# docker network create -d overlay zzhz-net bi7ckblr4gjjxyumtld6dw8l2 [root@localhost7D ~]# docker network ls NETWORK ID NAME DRIVER SCOPE c23d7007f44f bridge bridge local f13ac876a7b5 docker_gwbridge bridge local b156b4b51d7a host host local aaeakbe5en53 ingress overlay swarm 37d856a21214 none null local bi7ckblr4gjj zzhz-net overlay swarm [root@localhost7D ~]# docker service create -p 80:80 --replicas 4 --network zzhz-net --name AAAA harbor1.abc.com/web/nginx:v1
一、理解swarm overlay网络架构
实例说明: localhost7B 192.168.80.110 manager localhost7C 192.168.80.120 manager localhost7D 192.168.80.130 manager localhost7E 192.168.80.140 work 1.初始化Docker swarm时的网络情况 [root@localhost7D ~]# docker network ls NETWORK ID NAME DRIVER SCOPE c23d7007f44f bridge bridge local f13ac876a7b5 docker_gwbridge bridge local b156b4b51d7a host host local aaeakbe5en53 ingress overlay swarm 37d856a21214 none null local [root@localhost7D ~]# brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242605669a8 no docker_gwbridge 8000.0242df5ee34d no vethf21f3e0 virbr0 8000.525400fbc09b yes virbr0-nic [root@localhost7D ~]# ip a 11: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:df:5e:e3:4d brd ff:ff:ff:ff:ff:ff inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge valid_lft forever preferred_lft forever inet6 fe80::42:dfff:fe5e:e34d/64 scope link valid_lft forever preferred_lft forever 53: vethf21f3e0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default link/ether a2:6b:4f:7e:05:7a brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::a06b:4fff:fe7e:57a/64 scope link valid_lft forever preferred_lft forever 2.当前节点如何访问容器内服务,首先进入/run/docker/netns路径下,该目录保存了docker创建的网络信息 [root@localhost7D ~]# ll /run/docker/netns/ -r--r--r-- 1 root root 0 11月 24 15:56 1-aaeakbe5en -r--r--r-- 1 root root 0 11月 24 15:56 ingress_sbox 2. docker下的三个网络需要通过nsenter指令进入容器的命令空间去执行相关命令查看 [root@localhost7D ~]# nsenter --net=/run/docker/netns/ingress_sbox ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.3/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever 52: eth1@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet 172.18.0.2/16 brd 172.18.255.255 scope global eth1 valid_lft forever preferred_lft forever [root@localhost7D ~]# nsenter --net=/run/docker/netns/1-aaeakbe5en ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 12:af:c1:50:b2:71 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 brd 10.0.0.255 scope global br0 valid_lft forever preferred_lft forever 49: vxlan0@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN group default link/ether ea:98:22:00:6c:8c brd ff:ff:ff:ff:ff:ff link-netnsid 0 51: veth0@if50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default link/ether 12:af:c1:50:b2:71 brd ff:ff:ff:ff:ff:ff link-netnsid 1 #查看网络信息 [root@localhost7D ~]# docker network inspect docker_gwbridge [ { "Name": "docker_gwbridge", "Id": "f13ac876a7b53c5a8951732a6ee2bcaa7ab77ae6f154b9df9707a3923ec11884", "Created": "2022-11-24T10:46:54.680529973+08:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "ingress-sbox": { "Name": "gateway_ingress-sbox", "EndpointID": "5e722a2141d09fc97f3f606a8adfd672d7c077d4fde39615980d7095ce469e59", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" } }, "Options": { "com.docker.network.bridge.enable_icc": "false", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.name": "docker_gwbridge" }, "Labels": {} } ] #查看网络信息 [root@localhost7D ~]# docker network inspect ingress [ { "Name": "ingress", "Id": "aaeakbe5en53has5tnzg07juk", "Created": "2022-11-24T15:56:51.659654935+08:00", "Scope": "swarm", "Driver": "overlay", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "10.0.0.0/24", #ingress子网和网关 "Gateway": "10.0.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": true, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "ingress-sbox": { "Name": "ingress-endpoint", "EndpointID": "b996066d17897e77ecdfb303c0dba844cdeb0432721bd2a8f987663a641de602", "MacAddress": "02:42:0a:00:00:03", "IPv4Address": "10.0.0.3/24", "IPv6Address": "" } }, "Options": { "com.docker.network.driver.overlay.vxlanid_list": "4096" #vxlan的ID }, "Labels": {}, "Peers": [ { "Name": "9add565cade0", #已开通的隧道节点 "IP": "192.168.80.130" }, { "Name": "6235db13c5a7", "IP": "192.168.80.140" }, { "Name": "57875cbb2af5", "IP": "192.168.80.110" }, { "Name": "2ffc2ab9b510", "IP": "192.168.80.120" } ] } ]
得出的大概拓扑图:(红框部分)
二、创建服务后的网络架构
2.1、创建服务后 [root@localhost7B ~]#docker service create --replicas 4 -p 80:80 --name nginxA harbor1.abc.com/web/nginx:v1 当前节点如何访问容器内服务,首先进入/run/docker/netns路径下,该目录保存了docker创建的网络信息 [root@localhost7D ~]# ll /run/docker/netns/ -r--r--r-- 1 root root 0 11月 24 15:56 1-aaeakbe5en #ingress网络 -r--r--r-- 1 root root 0 11月 28 11:05 447eb8141dfb #nginxA容器网络 -r--r--r-- 1 root root 0 11月 24 15:56 ingress_sbox # 可以看到这里包含三个网络,其中一个是ingress_sbox(一端连接docker_gwbridge网桥,另一端连接ingress这个overlay网络, 方便宿主机通过docker_gwbridge访问ingress网络的sbox,由此进去到overlay网络当中,然后连接到ingress网络的这一端口会进行负载转发数据包, 即ipvs,将数据包转发到某个容器的虚拟ip),还有两个中,一个是创建的nginx服务使用的网络,还有一个就是swarm创建的ingress网络。 获取到nginx”服务“在ingress网络上分配到的虚拟ip对应的overlay网络id号,通过这个id,我们就可以知道上述两个网卡中1-aaeakbe5en是ingress网络(所有节点ID名一样),而447eb8141dfb是nginx使用的网络 [root@localhost7D ~]# docker inspect nginxA | grep -i virtualips -A 4 "VirtualIPs": [ { "NetworkID": "aaeakbe5en53has5tnzg07juk", "Addr": "10.0.0.64/24" #VIP } 2.2宿主机桥接信息 [root@localhost7D ~]# brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242605669a8 no docker_gwbridge 8000.0242df5ee34d no veth747768c vethf21f3e0 virbr0 8000.525400fbc09b yes virbr0-nic 2.3.宿主机服务 [root@localhost7D ~]# docker service ps nginxA ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 8lbqtqf8ysfr nginxA.1 harbor1.abc.com/web/nginx:v1 localhost7B.localdomain Running Running 13 minutes ago wjthkfg1sy9w nginxA.2 harbor1.abc.com/web/nginx:v1 localhost7e.localdomain Running Running 13 minutes ago 8hccaxcscwi6 nginxA.3 harbor1.abc.com/web/nginx:v1 localhost7D.localdomain Running Running 13 minutes ago qdjrcpd40jjc nginxA.4 harbor1.abc.com/web/nginx:v1 localhost7C.localdomain Running Running 13 minutes ago 2.3.宿主机IP [root@localhost7D ~]# ifconfig veth747768c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::e4c7:ffff:fe1b:73e6 prefixlen 64 scopeid 0x20<link> ether e6:c7:ff:1b:73:e6 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 438 bytes 79639 (77.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vethf21f3e0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::a06b:4fff:fe7e:57a prefixlen 64 scopeid 0x20<link> ether a2:6b:4f:7e:05:7a txqueuelen 0 (Ethernet) RX packets 4606 bytes 252910 (246.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 156252 bytes 36468814 (34.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@localhost7D ~]# ip a 53: vethf21f3e0@if52: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default link/ether a2:6b:4f:7e:05:7a brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::a06b:4fff:fe7e:57a/64 scope link valid_lft forever preferred_lft forever 126: veth747768c@if125: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default link/ether e6:c7:ff:1b:73:e6 brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet6 fe80::e4c7:ffff:fe1b:73e6/64 scope link valid_lft forever preferred_lft forever 2.4 docker下的三个网络需要通过nsenter指令进入容器的命令空间去执行相关命令 #ingress_sbox: [root@localhost7D ~]# nsenter --net=/run/docker/netns/ingress_sbox ip a 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:00:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.3/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 10.0.0.64/32 brd 10.0.0.64 scope global eth0 valid_lft forever preferred_lft forever 52: eth1@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet 172.18.0.2/16 brd 172.18.255.255 scope global eth1 valid_lft forever preferred_lft forever nginx服务 [root@localhost7D ~]# nsenter --net=/run/docker/netns/447eb8141dfb ip a 123: eth0@if124: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:00:00:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.66/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever 125: eth1@if126: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1 valid_lft forever preferred_lft forever #ingress网络: [root@localhost7D ~]# nsenter --net=/run/docker/netns/1-aaeakbe5en ip a 49: vxlan0@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN group default link/ether ea:98:22:00:6c:8c brd ff:ff:ff:ff:ff:ff link-netnsid 0 51: veth0@if50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default link/ether 12:af:c1:50:b2:71 brd ff:ff:ff:ff:ff:ff link-netnsid 1 124: veth16@if123: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default link/ether 02:c4:fa:38:92:36 brd ff:ff:ff:ff:ff:ff link-netnsid 2 2.5 #查看网络信息 [root@localhost7D ~]# docker network inspect docker_gwbridge [ { "Name": "docker_gwbridge", "Id": "f13ac876a7b53c5a8951732a6ee2bcaa7ab77ae6f154b9df9707a3923ec11884", "Created": "2022-11-24T10:46:54.680529973+08:00", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "57f1dd25225792466d1409b4770b82d76bf8b97defbf91ae31c7d6127c66ced1": { "Name": "gateway_447eb8141dfb", "EndpointID": "74c104f4510ec7ea68d8ad79ed92fcdf88218b891039ce76418643ba0b0ed488", "MacAddress": "02:42:ac:12:00:03", "IPv4Address": "172.18.0.3/16", "IPv6Address": "" }, "ingress-sbox": { "Name": "gateway_ingress-sbox", "EndpointID": "5e722a2141d09fc97f3f606a8adfd672d7c077d4fde39615980d7095ce469e59", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" } }, "Options": { "com.docker.network.bridge.enable_icc": "false", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.name": "docker_gwbridge" }, "Labels": {} } ] #查看网络信息 [root@localhost7D ~]# docker network inspect ingress [ { "Name": "ingress", "Id": "aaeakbe5en53has5tnzg07juk", "Created": "2022-11-24T15:56:51.659654935+08:00", "Scope": "swarm", "Driver": "overlay", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "10.0.0.0/24", "Gateway": "10.0.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": true, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "57f1dd25225792466d1409b4770b82d76bf8b97defbf91ae31c7d6127c66ced1": { "Name": "nginxA.3.8hccaxcscwi65c13clswks3vp", "EndpointID": "c54628c48c7f2fea55163169e53e643d59155cccd50449df71d7eeb79b3a5547", "MacAddress": "02:42:0a:00:00:42", "IPv4Address": "10.0.0.66/24", "IPv6Address": "" }, "ingress-sbox": { "Name": "ingress-endpoint", "EndpointID": "b996066d17897e77ecdfb303c0dba844cdeb0432721bd2a8f987663a641de602", "MacAddress": "02:42:0a:00:00:03", "IPv4Address": "10.0.0.3/24", "IPv6Address": "" } }, "Options": { "com.docker.network.driver.overlay.vxlanid_list": "4096" }, "Labels": {}, "Peers": [ { "Name": "9add565cade0", "IP": "192.168.80.130" }, { "Name": "6235db13c5a7", "IP": "192.168.80.140" }, { "Name": "57875cbb2af5", "IP": "192.168.80.110" }, { "Name": "2ffc2ab9b510", "IP": "192.168.80.120" } ] } ] #进入容器查看ip和路由 [root@localhost7D ~]# docker exec -it nginxA.3.8hccaxcscwi65c13clswks3vp bash [root@57f1dd252257 nginx-1.21.0]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 123: eth0@if124: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:00:00:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.66/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever 125: eth1@if126: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1 valid_lft forever preferred_lft forever [root@57f1dd252257 nginx-1.21.0]# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.0.0.66 netmask 255.255.255.0 broadcast 10.0.0.255 ether 02:42:0a:00:00:42 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.3 netmask 255.255.0.0 broadcast 172.18.255.255 ether 02:42:ac:12:00:03 txqueuelen 0 (Ethernet) RX packets 617 bytes 112626 (109.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@57f1dd252257 nginx-1.21.0]# ip route default via 172.18.0.1 dev eth1 10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.66 172.18.0.0/16 dev eth1 proto kernel scope link src 172.18.0.3 图:现在有了完整的网络拓扑图
三、数据包转发过程
1.当访问宿主机的80端口时,会走到raw表的PREROUTING规则链中,但是,由于raw表中没有任何特殊的规则,可以忽略,后续也不在展示,如需查看raw表,可以通过iptables -t raw -L -n -v 2.接下来继续查看mangle表,同样的宿主机中的mangle表也未定义特殊的规则链,故跳过,可以通过iptables -t mangle -L -n -v 查看 3.来到nat表的PREROUTING规则链中,由于访问的是主机的ip,因此会走第一条链路,进入DOCKER-INGRESS(匹配目的IP类型为本机地址),其会将目的端口80的流量转发到172.18.0.2:80 [root@localhost7D netns]# iptables -t nat -L -n -v Chain PREROUTING (policy ACCEPT 15035 packets, 1645K bytes) pkts bytes target prot opt in out source destination 43618 2621K DOCKER-INGRESS all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL #匹配目的IP类型为本机地址 41695 2503K DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain DOCKER-INGRESS (2 references) pkts bytes target prot opt in out source destination 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 to:172.18.0.2:80 41694 2503K RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 3.我们找到了docker_gwbridge(172.18.0.1),可以看到两个ip处于同一网段,那么172.18.0.2应该也连接上docker_gwbridge; [root@localhost7D netns]# brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242605669a8 no docker_gwbridge 8000.0242df5ee34d no veth747768c vethf21f3e0 virbr0 8000.525400fbc09b yes virbr0-nic 5.此时,我们的请求由192.168.80.140:80 -> 172.18.0.2:80,故,后续会走FORWARD转发规则链 由于转发ip属于172.18.0.0/24网段,因此会路由到docker_gwbridge网桥(172.18.0.1),然后在filter表中的FORWARD规则链中发送到非docker_gwbridge的目的地(ingress_sbox的eth1接口) [root@localhost7D netns]# iptables -t filter -vnL Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) 4576 186K ACCEPT all -- docker_gwbridge !docker_gwbridge 0.0.0.0/0 0.0.0.0/0 6.最后,就来到了POSTROUTING规则链,mangle表中无特殊规则,直接查看nat表 可以看到nat表中,做了一个MASQUERADE操作,即将数据包来源改为当前及其网卡分配到的ip,即192.168.80.140 [root@localhost7D netns]# iptables -t nat -vnL Chain POSTROUTING (policy ACCEPT 14494 packets, 1982K bytes) 28 1858 MASQUERADE all -- * !docker_gwbridge 172.18.0.0/16 0.0.0.0/0 至此,我们的数据包就从宿主机来到了docker-gwbridge网桥(172.18.0.1/24),并通过接口vethf21f3e0@if52流向了我们的ingress_sbox的eth1接口(172.18.0.2)。 Swarm节点内部是如何进行转发的 当我们访问任一节点的80端口时,只要我们这个节点处于Swarm集群中,不管服务是否部署到这个节点都能访问,只要端口相同即可。 我们本地的请求会被转发到Ingress_sbox这个Network Namespace中,在这个名称空间中再通过lvs转发到具体服务容器的ip和80端口中去。 当在任何一个Swarm节点去访问端口服务的时候会通过本节点的IPVS ( ip virtual service )到真正的Swarm节点上(说明每个节点都有ingress-sbox容器,并都有VIP地址)。提供以下三种功能: 1.外部访问的均衡负载;2.服务端口暴露到各个Swarm节点; 3.内部通过IPVS进行均衡负载; 7.我们查看下docker_gwbridge网络信息,我们可以发现ingress-sbox就是我们要找的命名空间,gateway_ingress-sbox就是所属的容器; [root@localhost7D netns]# docker network inspect docker_gwbridge "ingress-sbox": { "Name": "gateway_ingress-sbox", "EndpointID": "5e722a2141d09fc97f3f606a8adfd672d7c077d4fde39615980d7095ce469e59", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" } 接下来就要分析来到ingress_sbox后的路由情况,因此,需要通过nsenter访问docker容器的命令空间,获取ingress_sbox的网络路由信息 #在/run/docker/netns下执行以下命令,查看nat表 nsenter --net=ingress_sbox iptables -t nat -L -n -v #在/run/docker/netns下执行以下命令,查看mangle表 nsenter --net=ingress_sbox iptables -t mangle -L -n -v #在/run/docker/netns下执行以下命令,查看filter表 nsenter --net=ingress_sbox iptables -t filter -L -n -v 8.首先,查看mangle表的PREROUTING规则链,可以看到发往80端口的数据包已经被修改,且标记位0x10a,换算成十进制为266, 而nat表中PREROUTING未定义任何规则。因此,其会继续走到INPUT规则链,其将目的ip为10.0.0.64的ip同样标记为了266. mangle表的标记是16进制,转化为10进制就是266 进入ingress_sbox内部,查看iptables规则,可以看到发送到该ip地址下的80端口的请求被负载 [root@localhost7D netns]# nsenter --net=ingress_sbox iptables -t mangle -L -n -v sh-4.2# iptables -nL -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 MARK set 0x10a Chain INPUT (policy ACCEPT) target prot opt source destination MARK all -- 0.0.0.0/0 10.0.0.64 MARK set 0x10a 9.由于ingress_sbox会通过ipvs负载转发数据包到某个容器的虚拟ip上(即Routing Mesh路由转发),故需要通过(#在host安装ipvsadm yum install -y ipvsadm)ipvsadm指令查看对应的路由结果。 此时,我们查看ipvs负载路由,通过命令ipvsadm可以发现标记位266已经将数据包路由到了10.0.0.64 [root@localhost7D ~]# nsenter --net=/run/docker/netns/ingress_sbox sh sh-4.2# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 266 rr -> 10.0.0.65:0 Masq 1 0 0 -> 10.0.0.66:0 Masq 1 0 0 -> 10.0.0.67:0 Masq 1 0 0 -> 10.0.0.68:0 Masq 1 0 0 10.此时,数据包已经流向10.0.0.64,而在nat表的POSTROUTING链中,定义了流向10.0.0.0/24网络的数据,会通过ipvs路由到10.0.0.3上,即ingress_sbox的接口eth0。 基于这个eth0接口,可以将数据包经由ingress网络的接口veth0进行传输。 [root@localhost7D netns]# nsenter --net=ingress_sbox iptables -t nat -L -n -v Chain POSTROUTING (policy ACCEPT 28 packets, 1858 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER_POSTROUTING all -- * * 0.0.0.0/0 127.0.0.11 4 208 SNAT all -- * * 0.0.0.0/0 10.0.0.0/24 ipvs to:10.0.0.3 11.最后来到ingress网络,这里ingress会对数据包的发送地址进行判断。若目的地址就在本机所在的容器环境的话,就会发往对应的网络接口; 若目的地址不在当前网络环境下,则会通过vxlan将数据包发送到边缘设备交换机,在这一步骤会封装发起ip、mac和目的ip、mac, 再由交换机发往对应VTEP报文中的ip和mac地址所在的交换机,然后再传输回对应的ingress网络层,并发往对应的网络接口。 发往的目的ip就在当前机器构建的虚拟网络层中,因此会由接口veth2发往nginx容器服务的eth0接口(10.0.0.64) 12.接下来随便在一台主机找到nginx容器,查看器IP情况,我们会发现与ipvsadm的相对应
标签:ingress,overlay,0.0,00,swarm,ff,docker,root From: https://www.cnblogs.com/Yuanbangchen/p/16925201.html