桥接网络是Docker的默认网络模式。在桥接网络中,Docker会为每个容器创建一个虚拟网络接口,并为容器分配一个IP地址。容器可以通过桥接网络与主机和其他容器进行通信,也能暴露端口供外部访问。
容器之间的通信原理
首先我们创建两个容器:
$ docker container run -d --rm --name box1 busybox /bin/sh -c "while true; do sleep 3600; done"
e6e89f95de12eeda726fed5f4f909d32be2ea13c3cecb350acd86bc13394b769
$ docker container run -d --rm --name box2 busybox /bin/sh -c "while true; do sleep 3600; done"
c0c1a152155bcf66bed71fdc51e558f4c3b1c3632866c61a69303a4da10c2f54
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c0c1a152155b busybox "/bin/sh -c 'while t…" 31 seconds ago Up 30 seconds box2
e6e89f95de12 busybox "/bin/sh -c 'while t…" 41 seconds ago Up 40 seconds box1
然后我们在容器box1中尝试ping通容器box2:
$ docker container exec -it box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
$ docker container exec -it box1 ping 172.17.0.3 -c 3
PING 172.17.0.3 (172.17.0.3): 56 data bytes
64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.886 ms
64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.049 ms
64 bytes from 172.17.0.3: seq=2 ttl=64 time=0.106 ms
--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.049/0.347/0.886 ms
为什么在box1中能ping通box2呢?容器之间是怎么通讯的呢?
Docker是使用namespace实现网络,计算等资源的隔离,但是为什么使用ip netns
命令却无法在主机上看到任何network namespace呢?
这是因为Docker默认把创建的网络命名空间链接文件隐藏起来了,导致ip netns
命令无法读取,给分析网络原理和排查问题带来了麻烦。
下面是恢复netns命名空间的办法。
执行下面的命令来获取容器进程号:
$ docker inspect box1 | grep Pid
"Pid": 43568,
"PidMode": "",
"PidsLimit": null,
$ docker inspect box2 | grep Pid
"Pid": 43640,
"PidMode": "",
"PidsLimit": null,
执行如下命令,将进程网络命名空间恢复到主机目录:
$ ln -s /proc/43568/ns/net /var/run/netns/box1
$ ln -s /proc/43640/ns/net /var/run/netns/box2
如果/var/run/netns目录不存在,以root用户手动创建目录即可。
然后执行ip netns命令即可看到容器的网络命名空间:
$ ip netns list
box2 (id: 3)
box1 (id: 2)
查看网络命名空间box1和box2的IP地址:
$ ip netns exec box1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
$ ip netns exec box2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
21: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
发现网络命名空间box1的IP为172.17.0.2
,网络命名空间box2的IP为172.17.0.3
,要想实现两个相同网段的网络命名空间的通信,需要借助bridge。
Docker默认会创建一个名为docker0
的bridge:
$ ip link show type bridge
9: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 02:42:53:1d:f7:5f brd ff:ff:ff:ff:ff:ff
然后查看一下docker0
的veth网口:
$ brctl show docker0
bridge name bridge id STP enabled interfaces
docker0 8000.0242531df75f no vetha7d1dd5
vethadaa66f
docker0
有两个veth网口:vetha7d1dd5、vethadaa66f
再来主机上看下veth网口:
$ ip link show type veth
20: vethadaa66f@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 52:4c:41:8c:91:01 brd ff:ff:ff:ff:ff:ff link-netns box1
22: vetha7d1dd5@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
link/ether 8a:e9:19:ce:72:cb brd ff:ff:ff:ff:ff:ff link-netns box2
我们可以看到网络命名空间box1通过veth paireth0(if19)-vethadaa66f(if20)
连接bridge0
,网络命名空间box2通过veth paireth0(if21)-vetha7d1dd5(if22)
连接bridge0
,这样网络命名空间box1和网络命名空间box2就能进行通讯了。
来个网络拓扑图:
容器访问外部网络原理
单靠网络命名空间+bridge只能实现网络命名空间之前的通讯,容器想要访问外部网络还需要借助iptables实现SNAT。
在box1中ping百度:
$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=9.908 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=14.939 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=11.023 ms
--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.908/11.956/14.939 ms
查看iptables的规则:
$ iptables -nvxL -t nat
Chain PREROUTING (policy ACCEPT 20 packets, 3083 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 1 packets, 229 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes)
pkts bytes target prot opt in out source destination
6 300 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
发现nat表的POSTROUTING链中有一条规则是对源地址为172.17.0.0/16
的网段进行SNAT转换,这样就可以跟外部网络进行通讯了。
我们清空iptables的所有规则:
$ iptables -t filter -F
$ iptables -t filter -X
$ iptables -t filter -Z
$ iptables -t nat -F
$ iptables -t nat -X
$ iptables -t nat -Z
再次查看所有的规则,发现规则和自定义链已经清空了:
$ iptables -t filter -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
$ iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
再次尝试访问百度,无法访问:
$ docker exec -it box1 ping www.baidu.com -c 3
ping: bad address 'www.baidu.com'
我们手动用iptables增加一条nat规则:
$ iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE
再次访问百度,发现已经可以通讯了:
$ docker exec -it box1 ping www.baidu.com -c 3
PING www.baidu.com (14.119.104.189): 56 data bytes
64 bytes from 14.119.104.189: seq=0 ttl=51 time=16.015 ms
64 bytes from 14.119.104.189: seq=1 ttl=51 time=9.960 ms
64 bytes from 14.119.104.189: seq=2 ttl=51 time=9.247 ms
--- www.baidu.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 9.247/11.740/16.015 ms
有时filter表的FORWARD链的默认执行策略是DROP,我们需要手动将这个默认执行策略改为ACCEPT才能通讯,使用如下命令:
$ iptables -P FORWARD ACCEPT
现在因为我们暴力执行iptables -F
导致docker的规则全清,想还原Docker的默认规则该怎么办呢?使用如下命令重启Docker即可:
$ service docker restart
当然不嫌麻烦,也可以手动一条一条将规则添加上。
端口转发原理
在容器创建时可以使用-p
参数指定将主机的端口映射到容器的端口,从而实现将访问主机端口的请求转发到容器内部。
首先创建一个nginx的web容器,并指定将主机的端口8080映射到容器的80端口:
$ docker container run -d --rm --name web -p 8080:80 nginx
441c77091abfeb9498d4fd21d62594d75363fb42338c4ec51a42b6f01d80e418
访问主机的8080端口,发现成功请求到容器内部:
$ curl localhost:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
这种端口转发是怎么实现的呢?还是通过我们的老朋友iptables实现的。这里使用的是iptables实现DNAT。
查询iptables的规则:
$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
我们可以发现在nat表的POSTROUTING链增加了如下规则,主要用于web容器可以访问外部网络:
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80
还在DOCKER链(被PREROUTING引用)中增加了如下规则,用于将主机8080端口的请求转发到172.17.0.2:80
:
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
下面我们启动一个容器时尝试不指定-p
参数配置端口转发,手动通过iptables配置规则实现端口转发。
启动一个nginx镜像的web容器,不指定端口转发:
$ docker container run -d --rm --name web nginx
此时查看iptables的规则,发现除了docker的基础规则,并未添加新的转发规则:
$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
此时访问主机的8080端口也是不通的:
$ curl 172.19.85.122:8080
curl: (7) Failed to connect to 172.19.85.122 port 8080: Connection refused
添加DNAT规则:
$ iptables -t nat -I DOCKER ! -i docker0 -p tcp --dport 8080 -j DNAT --to-destination 172.17.0.2:80
$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
2 120 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.17.0.2:80
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
此时可以通过主机的8080端口访问到web容器了:
$ curl 172.19.85.122:8080
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>