系统性能之网络篇（三）

标签：0.0 性能系统网络 nf 跟踪 conntrack tcp netfilter

NAT 技术能够重写 IP 数据包的源 IP 或目的 IP，所以普遍用来解决公网 IP 地址短缺的问题。它可以让网络中的多台主机，通过共享同一个公网 IP 地址，来访问外网资源。（比如云主机上网通过配置一个NAT解决所有主机出网）

1.工具介绍

SystemTap 是 Linux 的一种动态追踪框架，它把用户提供的脚本，转换为内核模块来执行，用来监测和跟踪内核的行为。

安装：yum install systemtap kernel-devel yum-utils kernel && stab-prep

追踪脚本编写，跟踪内核函数kfree_skb()的调用，并统计丢包位置。生成的文件保存后，执行stap命令，就可以运行丢包跟踪脚本。

#! /usr/bin/env stap
############################################################
Dropwatch.stp
Author: Neil Horman <nhorman@redhat.com>
An example script to mimic the behavior of the dropwatch utility
http://fedorahosted.org/dropwatch
############################################################
Array to hold the list of drop points we find
global locations
Note when we turn the monitor on and off
probe begin { printf("Monitoring for dropped packets\n") }
probe end { printf("Stopping dropped packet monitor\n") }
increment a drop counter for every location we drop at
probe kernel.trace("kfree_skb") { locations[$location] <<< 1 }
Every 5 seconds report our drop locations
probe timer.sec(5)
{
printf("\n")
foreach (l in locations-) {
printf("%d packets dropped at %s\n",
@count(locations[l]), symname(l))
}
delete locations
}

2.NAT性能案例分析

终端一运行nginx容器

$ docker run --name nginx --privileged -p 8080:8080 -itd feisky/nginx:nat

执行iptables命令，确认DNAT规则创建，在PREROUTING链中，目的为本地请求，会转到DOCKER链中；而在DOCKER链中，目的为8080的tcp 请求，会被DNAT到172.17.0.2的8080端口。

$ iptables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
...
Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 to:172.17.0.2:8080

终端二运行ab命令，结果显示：每秒请求数为76；每个请求延迟为65s；建立连接的延迟为1300ms。并发请求数大大降低，是什么原因呢？可以从内核丢包分析。

-c表示并发请求数为5000，-n表示总的请求数为10万
-r表示套接字接收错误时仍然继续执行，-s表示设置每个请求的超时时间为30s
$ ab -c 5000 -n 10000 -r -s 30 http://192.168.0.30:8080/
...
Requests per second:    76.47 [#/sec] (mean)
Time per request:       65380.868 [ms] (mean)
Time per request:       13.076 [ms] (mean, across all concurrent requests)
Transfer rate:          44.79 [Kbytes/sec] received
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0 1300 5578.0      1   65184
Processing:     0 37916 59283.2      1  130682
Waiting:        0    2   8.7      1     414
Total:          1 39216 58711.6   1021  130682
...

用上文工具中的SystemTap分析。 stap命令执行后输出结果。大量的丢包都发生在nf_hook_slow 。

10031 packets dropped at nf_hook_slow
676 packets dropped at tcp_v4_rcv
7284 packets dropped at nf_hook_slow
268 packets dropped at tcp_v4_rcv

确定是不是NAT引起的方案：使用perf record和perf report命令。

记录一会（比如30s）后按Ctrl+C结束
$ perf record -a -g -- sleep 30
输出报告
$ perf report -g graph,0

在perf report 界面上，输入查找命令/，在弹出的对话框中输入nf_hook_slow，展开调用栈分析。调用最多的是哪个地方，分别是 ipv4_conntrack_in（接受网络包时，在连接跟踪表中查找连接，并为新的连接分配跟踪对象）、br_nf_pre_routing （容器的网络是通过网桥实现）以及 iptable_nat_ipv4_in（接受网络包时候，执行DNAT,把8080端口收到的包转发给容器）。

系统性能之网络篇（三）_.net

DANT ，其实就是conntrack。可以查看下内核提供的conntrack配置选项。

$ sysctl -a | grep conntrack
net.netfilter.nf_conntrack_count = 180        #表示当前连接跟踪数
net.netfilter.nf_conntrack_max = 1000         #最大连接跟踪数
net.netfilter.nf_conntrack_buckets = 65536    #连接跟踪表大小
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
...

ab命令并发请求数设置的是5000，请求数为100000，跟踪表设置为1000是不够的。同时也可以查询日志查看内核是否有错误。如dmesg | tail，报出“nf_conntrack: table full”。提示连接跟踪表满了，是不是调大就行呢？

其实连接跟踪表是内存中哈希表，如果连接跟踪数过大，会消耗过大内存。连接跟踪表即每一项都是一个链表，而链表长度，就等于nf_conntrack_max 除以 nf_conntrack_buckets。

连接跟踪对象大小为376，链表项大小为16
nf_conntrack_max连接跟踪对象大小+nf_conntrack_buckets链表项大小
= 1000*376+65536*16 B
= 1.4 MB

接下来调整nf_conntrack_max 改大些，再次ab 发现并发的请求数增多了。

$ sysctl -w net.netfilter.nf_conntrack_max=131072
$ sysctl -w net.netfilter.nf_conntrack_buckets=65536

连接跟踪表都是什么内容呢？包括了协议，连接状态，源ip，源端口，目的ip，目的端口，跟踪状态。当出现TIME_WAIT值比较大，其实这个会在超时后自动清理。默认超时时间为120s. sysctl net.netfilter.nf_conntrack_tcp_timeout_time_wait 即可查看。

系统性能之网络篇（三）_NAT_02

标签：0.0,性能,系统,网络,nf,跟踪,conntrack,tcp,netfilter
From： https://blog.51cto.com/u_12191723/6069074

系统性能之网络篇（三）

1.工具介绍

2.NAT性能案例分析

相关文章

赞助商

阅读排行