0%

tcp 性能优化

  • Packet Size ,Window Size and Socket Buffer In TCP
  • Nagel and Delayed Ack
  • SO_SNDBUF and SO_RCVBUF
  • TCP Window FULL
  • TCP Window ZERO
  • sack
  • BDP(bandwidth-delay product) and RTT(round trip time)
  • 重传超时时间(RTO)
  • Interactive Data Flow and Bulk Data Flow
  • RWND and CWND

TCP Window Size

  • SO_SNDBUF send buffer size at the sender
  • SO_RCVBUF receive buffer size at the receiver

用于 TCP 握手中 TCP Window Size,影响吞吐

  • 使用 wireshark tcp 时序图分析问题
  • 接收窗口的大小对性能的影响,RTT 越大影响越明显

BDP = RTT * (Bandwidth / 8)

TCP_QUICKACK

$ man 7 tcp
TCP_QUICKACK (since Linux 2.4.4)
              Enable  quickack mode if set or disable quickack mode if cleared.  In quickack mode, acks are sent immediately, rather than delayed if
              needed in accordance to normal TCP operation.  This flag is not permanent, it only enables a switch to or from quickack mode.   Subse‐
              quent  operation  of  the TCP protocol will once again enter/leave quickack mode depending on internal protocol processing and factors
              such as delayed ack timeouts occurring and data transfer.  This option should not be used in code intended to be portable.

TCP_QUICKACK 不是永久的,需要在每次调用 recv 后重新设置

传输速率

  • RWND
  • CWND:Congestion Window,拥塞窗口,负责控制单位时间内,数据发送端的报文发送量。TCP 协议规定,一个 RTT(Round-Trip Time,往返时延,大家常说的 ping 值)时间内,数据发送端只能发送 CWND 个数据包(注意不是字节数)。TCP 协议利用 CWND/RTT 来控制速度。这个值是根据丢包动态计算出来的
  • SS:Slow Start,慢启动阶段。TCP 刚开始传输的时候,速度是慢慢涨起来的,除非遇到丢包,否则速度会一直指数性增长(标准 TCP 协议的拥塞控制算法,例如 cubic 就是如此。很多其它拥塞控制算法或其它厂商可能修改过慢启动增长特性,未必符合指数特性)
  • CA:Congestion Avoid,拥塞避免阶段。当 TCP 数据发送方感知到有丢包后,会降低 CWND,此时速度会下降,CWND 再次增长时,不再像 SS 那样指数增,而是线性增(同理,标准 TCP 协议的拥塞控制算法,例如 cubic 是这样,很多其它拥塞控制算法或其它厂商可能修改过慢启动增长特性,未必符合这个特性)
  • ssthresh:Slow Start Threshold,慢启动阈值。当数据发送方感知到丢包时,会记录此时的 CWND,并计算合理的 ssthresh 值(ssthresh <= 丢包时的 CWND),当 CWND 重新由小至大增长,直到 sshtresh 时,不再 SS 而是 CA。但因为数据确认超时(数据发送端始终收不到对端的接收确认报文),发送端会骤降 CWND 到最初始的状态
  • tcp_wmem 对应send buffer,也就是滑动窗口大小

上图一旦发生丢包,cwnd降到1 ssthresh降到cwnd/2,一夜回到解放前,太保守了,实际大多情况下都是公网带宽还有空余但是链路过长,非带宽不够丢包概率增大,对此没必要这么保守(tcp诞生的背景主要针对局域网、双绞线来设计,偏保守)。RTT越大的网络环境(长肥管道)这个问题越是严重,表现就是传输速度抖动非常厉害

  • 超时重传:ssthresh降到cwnd/2 cwnd降到1
  • 快速重传:CWND降低到一半并降低ssthresh

Socket Options

#include <sys/socket.h>

int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname, const void *optval socklen_t optlen);

/* Both return: 0 if OK,–1 on error */

参数列表

level optname get set Description Flag Datatype
SOL_SOCKET SO_BROADCAST x x Permit sending of broadcast datagrams x int
SO_DEBUG x x Enable debug tracing x int
SO_DONTROUTE x x Bypass routing table lookup x int
SO_ERROR x Get pending error and clear int
SO_KEEPALIVE x x Periodically test if connection still alive x int
SO_LINGER x x Linger on close if data to send linger{}
SO_OOBINLINE x x Leave received out-of-band data inline x int
SO_RCVBUF x x Receive buffer size int
SO_SNDBUF x x Send buffer size int
SO_RCVLOWAT x x Receive buffer low-water mark int
SO_SNDLOWAT x x Send buffer low-water mark int
SO_RCVTIMEO x x Receive timeout timeval{}
SO_SNDTIMEO x x Send timeout timeval{}
SO_REUSEADDR x x Allow local address reuse x int
SO_REUSEPORT x x Allow local port reuse x int
SO_TYPE x Get socket type int
SO_USELOOPBACK x x Routing socket gets copy of what it sends x int
IPPROTO_IP IP_HDRINCL x x IP header included with data x int
IP_OPTIONS x x IP header options (see text)
IP_RECVDSTADDR x x Return destination IP address x int
IP_RECVIF x x Return destination IP address x int
IP_TOS x x Type-of-service and precedence int
IP_TTL x x TTL int
IP_MULTICAST_IF x x Specify outgoing interface in_addr{}
IP_MULTICAST_TTL x x Specify outgoing TTL u_char
IP_MULTICAST_LOOP x x Specify loopback u_char
IP_{ADD,DROP}_MEMBERSHIP x Join or leave multicast group ip_mreq{}
IP_{BLOCK,UNBLOCK}_SOURCE x Block or unblock multicast source ip_mreq_source{}
IP_{ADD,DROP}_SOURCE_MEMBERSHIP x Join or leave source-specific multicast ip_mreq_source{}
IPPROTO_ICMPV6 ICMP6_FILTER x x Specify ICMPv6 message types to pass icmp6_filter{}
IPPROTO_IPV6 IPV6_CHECKSUM x x Offset of checksum field for raw sockets int
IPV6_DONTFRAG x x Drop instead of fragment large packets x int
IPV6_NEXTHOP x x Specify next-hop address sockaddr_in6{}
IPV6_PATHMTU x Retrieve current path MTU ip6_mtuinfo{}
IPV6_RECVDSTOPTS x Receive destination options x int
IPV6_RECVHOPLIMIT x x Receive unicast hop limit x int
IPV6_RECVHOPOPTS x x Receive hop-by-hop options x int
IPV6_RECVPATHMTU x x Receive path MTU x int
IPV6_RECVPKTINFO x x Receive packet information x int
IPV6_RECVRTHDR x x Receive source route x int
IPV6_RECVTCLASS x x Receive traffic class x int
IPV6_UNICAT_HOPS x x Default unicast hop limit int
IPV6_USE_MIN_MTU x x Use minimum MTU x int
IPV6_V6ONLY x x Disable v4 compatibility x int
IPV6_XXX x x Sticky ancillary data (see text)
IPV6_MULTICAST_IF x x Specify outgoing interface u_int
IPV6_MULTICAST_HOPS x x Specify outgoing hop limit int
IPV6_MULTICAST_LOOP x x Specify loopback x u_int
IPV6_JOIN_GROUP x Join multicast group ipv6_mreq{}
IPV6_LEAVE_GROUP x Leave multicast group ipv6_mreq{}
IPPROTO_IP or IPPROTO_IPV6 MCAST_JOIN_GROUP x Join multicast group group_req{}
MCAST_LEAVE_GROUP x Leave multicast group group_source_req{}
MCAST_BLOCK_SOURCE x Block multicast source group_source_req{}
MCAST_UNBLOCK_SOURCE x Unblock multicast source group_source_req{}
MCAST_JOIN_SOURCE_GROUP x Join source-specific multicast group_source_req{}
MCAST_LEAVE_SOURCE_GROUP x Leave source-specific multicast group_source_req{}

Ref