- Packet Size ,Window Size and Socket Buffer In TCP
- Nagel and Delayed Ack
- SO_SNDBUF and SO_RCVBUF
- TCP Window FULL
- TCP Window ZERO
- sack
- BDP(bandwidth-delay product) and RTT(round trip time)
- 重传超时时间(RTO)
- Interactive Data Flow and Bulk Data Flow
- RWND and CWND
TCP Window Size
SO_SNDBUF
send buffer size at the senderSO_RCVBUF
receive buffer size at the receiver
用于 TCP 握手中 TCP Window Size,影响吞吐
- 使用 wireshark tcp 时序图分析问题
- 接收窗口的大小对性能的影响,RTT 越大影响越明显
BDP = RTT * (Bandwidth / 8)
TCP_QUICKACK
$ man 7 tcp
TCP_QUICKACK (since Linux 2.4.4)
Enable quickack mode if set or disable quickack mode if cleared. In quickack mode, acks are sent immediately, rather than delayed if
needed in accordance to normal TCP operation. This flag is not permanent, it only enables a switch to or from quickack mode. Subse‐
quent operation of the TCP protocol will once again enter/leave quickack mode depending on internal protocol processing and factors
such as delayed ack timeouts occurring and data transfer. This option should not be used in code intended to be portable.
TCP_QUICKACK
不是永久的,需要在每次调用 recv 后重新设置
传输速率
- RWND
- CWND:Congestion Window,拥塞窗口,负责控制单位时间内,数据发送端的报文发送量。TCP 协议规定,一个 RTT(Round-Trip Time,往返时延,大家常说的 ping 值)时间内,数据发送端只能发送 CWND 个数据包(注意不是字节数)。TCP 协议利用 CWND/RTT 来控制速度。这个值是根据丢包动态计算出来的
- SS:Slow Start,慢启动阶段。TCP 刚开始传输的时候,速度是慢慢涨起来的,除非遇到丢包,否则速度会一直指数性增长(标准 TCP 协议的拥塞控制算法,例如 cubic 就是如此。很多其它拥塞控制算法或其它厂商可能修改过慢启动增长特性,未必符合指数特性)
- CA:Congestion Avoid,拥塞避免阶段。当 TCP 数据发送方感知到有丢包后,会降低 CWND,此时速度会下降,CWND 再次增长时,不再像 SS 那样指数增,而是线性增(同理,标准 TCP 协议的拥塞控制算法,例如 cubic 是这样,很多其它拥塞控制算法或其它厂商可能修改过慢启动增长特性,未必符合这个特性)
- ssthresh:Slow Start Threshold,慢启动阈值。当数据发送方感知到丢包时,会记录此时的 CWND,并计算合理的 ssthresh 值(ssthresh <= 丢包时的 CWND),当 CWND 重新由小至大增长,直到 sshtresh 时,不再 SS 而是 CA。但因为数据确认超时(数据发送端始终收不到对端的接收确认报文),发送端会骤降 CWND 到最初始的状态
- tcp_wmem 对应send buffer,也就是滑动窗口大小
上图一旦发生丢包,cwnd降到1 ssthresh降到cwnd/2,一夜回到解放前,太保守了,实际大多情况下都是公网带宽还有空余但是链路过长,非带宽不够丢包概率增大,对此没必要这么保守(tcp诞生的背景主要针对局域网、双绞线来设计,偏保守)。RTT越大的网络环境(长肥管道)这个问题越是严重,表现就是传输速度抖动非常厉害
- 超时重传:ssthresh降到cwnd/2 cwnd降到1
- 快速重传:CWND降低到一半并降低ssthresh
Socket Options
#include <sys/socket.h>
int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname, const void *optval socklen_t optlen);
/* Both return: 0 if OK,–1 on error */
参数列表
level | optname | get |
set |
Description | Flag | Datatype |
---|---|---|---|---|---|---|
SOL_SOCKET |
SO_BROADCAST |
x | x | Permit sending of broadcast datagrams | x | int |
SO_DEBUG |
x | x | Enable debug tracing | x | int |
|
SO_DONTROUTE |
x | x | Bypass routing table lookup | x | int |
|
SO_ERROR |
x | Get pending error and clear | int |
|||
SO_KEEPALIVE |
x | x | Periodically test if connection still alive | x | int |
|
SO_LINGER |
x | x | Linger on close if data to send | linger{} |
||
SO_OOBINLINE |
x | x | Leave received out-of-band data inline | x | int |
|
SO_RCVBUF |
x | x | Receive buffer size | int |
||
SO_SNDBUF |
x | x | Send buffer size | int |
||
SO_RCVLOWAT |
x | x | Receive buffer low-water mark | int |
||
SO_SNDLOWAT |
x | x | Send buffer low-water mark | int |
||
SO_RCVTIMEO |
x | x | Receive timeout | timeval{} |
||
SO_SNDTIMEO |
x | x | Send timeout | timeval{} |
||
SO_REUSEADDR |
x | x | Allow local address reuse | x | int |
|
SO_REUSEPORT |
x | x | Allow local port reuse | x | int |
|
SO_TYPE |
x | Get socket type | int |
|||
SO_USELOOPBACK |
x | x | Routing socket gets copy of what it sends | x | int |
|
IPPROTO_IP |
IP_HDRINCL |
x | x | IP header included with data | x | int |
IP_OPTIONS |
x | x | IP header options | (see text) | ||
IP_RECVDSTADDR |
x | x | Return destination IP address | x | int |
|
IP_RECVIF |
x | x | Return destination IP address | x | int |
|
IP_TOS |
x | x | Type-of-service and precedence | int |
||
IP_TTL |
x | x | TTL | int |
||
IP_MULTICAST_IF |
x | x | Specify outgoing interface | in_addr{} |
||
IP_MULTICAST_TTL |
x | x | Specify outgoing TTL | u_char |
||
IP_MULTICAST_LOOP |
x | x | Specify loopback | u_char |
||
IP_{ADD,DROP}_MEMBERSHIP |
x | Join or leave multicast group | ip_mreq{} |
|||
IP_{BLOCK,UNBLOCK}_SOURCE |
x | Block or unblock multicast source | ip_mreq_source{} |
|||
IP_{ADD,DROP}_SOURCE_MEMBERSHIP |
x | Join or leave source-specific multicast | ip_mreq_source{} |
|||
IPPROTO_ICMPV6 |
ICMP6_FILTER |
x | x | Specify ICMPv6 message types to pass | icmp6_filter{} |
|
IPPROTO_IPV6 |
IPV6_CHECKSUM |
x | x | Offset of checksum field for raw sockets | int |
|
IPV6_DONTFRAG |
x | x | Drop instead of fragment large packets | x | int |
|
IPV6_NEXTHOP |
x | x | Specify next-hop address | sockaddr_in6{} |
||
IPV6_PATHMTU |
x | Retrieve current path MTU | ip6_mtuinfo{} |
|||
IPV6_RECVDSTOPTS |
x | Receive destination options | x | int |
||
IPV6_RECVHOPLIMIT |
x | x | Receive unicast hop limit | x | int |
|
IPV6_RECVHOPOPTS |
x | x | Receive hop-by-hop options | x | int |
|
IPV6_RECVPATHMTU |
x | x | Receive path MTU | x | int |
|
IPV6_RECVPKTINFO |
x | x | Receive packet information | x | int |
|
IPV6_RECVRTHDR |
x | x | Receive source route | x | int |
|
IPV6_RECVTCLASS |
x | x | Receive traffic class | x | int |
|
IPV6_UNICAT_HOPS |
x | x | Default unicast hop limit | int |
||
IPV6_USE_MIN_MTU |
x | x | Use minimum MTU | x | int |
|
IPV6_V6ONLY |
x | x | Disable v4 compatibility | x | int |
|
IPV6_XXX |
x | x | Sticky ancillary data | (see text) | ||
IPV6_MULTICAST_IF |
x | x | Specify outgoing interface | u_int |
||
IPV6_MULTICAST_HOPS |
x | x | Specify outgoing hop limit | int |
||
IPV6_MULTICAST_LOOP |
x | x | Specify loopback | x | u_int |
|
IPV6_JOIN_GROUP |
x | Join multicast group | ipv6_mreq{} |
|||
IPV6_LEAVE_GROUP |
x | Leave multicast group | ipv6_mreq{} |
|||
IPPROTO_IP or IPPROTO_IPV6 |
MCAST_JOIN_GROUP |
x | Join multicast group | group_req{} |
||
MCAST_LEAVE_GROUP |
x | Leave multicast group | group_source_req{} |
|||
MCAST_BLOCK_SOURCE |
x | Block multicast source | group_source_req{} |
|||
MCAST_UNBLOCK_SOURCE |
x | Unblock multicast source | group_source_req{} |
|||
MCAST_JOIN_SOURCE_GROUP |
x | Join source-specific multicast | group_source_req{} |
|||
MCAST_LEAVE_SOURCE_GROUP |
x | Leave source-specific multicast | group_source_req{} |
Ref
- 长肥管道传输之痛与解决之道
BDP RTT BBR
- 我们来说一说TCP神奇的40ms
Nagel算法 Delayed Ack机制
- Linux下TCP延迟确认(Delayed Ack)机制导致的时延问题分析
TCP_QUICKACK
Nagle算法 Delayed Ack机制/proc/sys/net/ipv4/tcpdelackmin
- TCP性能和发送接收窗口、Buffer的关系
SO_SNDBUF
SO_RCVBUF
TCP Window Full
TCP分析优化
- TCP传输慢问题分析
RWND
CWND
SACK
TCP Window Full
TCP zero window
- Chapter 7. Socket Options
setsockopt
- TCP传输速度案例分析
RTT SendBuffer RecvBuffer之间的影响
window scaling factor
Delayed Ack机制
- 就是要你懂TCP–TCP性能问题
Nagel算法 Delayed Ack机制
- BBR(瓶颈带宽和往返传播时间)拥塞控制算法
CWND
BBR
- 30张图解: TCP 重传、滑动窗口、流量控制、拥塞控制
- 提升 TCP 性能的参数,你知道几个?
三次握手的性能提升 四次挥手的性能提升 数据传输的性能提升