之前在做waf并发压力测试的时候,遇到一个问题,仪器测试正常,但是真实环境测试超时丢包的验证的时候,并发cps都很低。
查看cat /proc/net/netstat发现OfoPruned 对应值很大,看内核代码才发现,内存不够或rmem超过sk_rcvbuf,就会私房ofo队列,还是全部释放。当时将全部释放改为释放最高的50%,效果明显。
今天查看新的内核发现依旧修改了。
在TCP套接口接收数据过程中,如果套接口接收缓存已经大于限定的套接口缓存限值,或者TCP系统占用的缓存已超过限定的总阈值,内核将使用tcp_prune_queue函数尝试回收接收队列占用的缓存。首先使用tcp_collapse_ofo_queue函数尝试合并out_of_order_queue队列中的重复数据,之后使用tcp_collapse函数尝试将sk_receive_queue队列中的数据折叠到少量的skb结构中;最后如果接收缓存还是占用过高,调用函数tcp_prune_ofo_queue删除out_of_order_queue队列中的数据包。
目前看最新的改动原则有:
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
1、让漏洞有机会被填补。这意味着如果数据包的序列位于传入数据包序列之前。
2、如果有数千个数据包停留在那里,则不会增加太大的延迟。(但如果应用程序缩小 SO_RCVBUF,我们仍然可能会此处释放整个队列)
针对这一块最新的内核改动如下:
/*
* Clean the out-of-order queue to make room.
* We drop high sequences packets to :
* 1) Let a chance for holes to be filled.
* This means we do not drop packets from ooo queue if their sequence
* is before incoming packet sequence.
* 2) not add too big latencies if thousands of packets sit there.
* (But if application shrinks SO_RCVBUF, we could still end up
* freeing whole queue here)
* 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks.
*
* Return true if queue has shrunk.
*/
static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_skb)
{
struct tcp_sock *tp = tcp_sk(sk);
struct rb_node *node, *prev;
bool pruned = false;
int goal;
if (RB_EMPTY_ROOT(&tp->out_of_order_queue))
return false;
goal = sk->sk_rcvbuf >> 3;
node = &tp->ooo_last_skb->rbnode;
do {
struct sk_buff *skb = rb_to_skb(node);
/* If incoming skb would land last in ofo queue, stop pruning. */
if (after(TCP_SKB_CB(in_skb)->seq, TCP_SKB_CB(skb)->seq))
break;
pruned = true;
prev = rb_prev(node);
rb_erase(node, &tp->out_of_order_queue);
goal -= skb->truesize;
tcp_drop_reason(sk, skb, SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE);
tp->ooo_last_skb = rb_to_skb(prev);
if (!prev || goal <= 0) {
if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
!tcp_under_memory_pressure(sk))
break;
goal = sk->sk_rcvbuf >> 3;
}
node = prev;
} while (node);
if (pruned) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED);
/* Reset SACK state. A conforming SACK implementation will
* do the same at a timeout based retransmit. When a connection
* is in a sad state like this, we care only about integrity
* of the connection not performance.
*/
if (tp->rx_opt.sack_ok)
tcp_sack_reset(&tp->rx_opt);
}
return pruned;
}
需要注意的是:
1、/* If incoming skb would land last in ofo queue, stop pruning. */---
2、只有发生了重传队列修剪才会重置sack选项信息
当时修改内核的时候,默认删除50%的量直接就干了
标签:node,sk,batches,prune,packets,tcp,queue,skb From: https://www.cnblogs.com/codestack/p/18226533