Demonstrations of tcpretrans, the Linux eBPF/bcc version.
This tool traces the kernel TCP retransmit function to show details of these
retransmits. For example:
# ./tcpretrans
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED
[...]
This output shows three TCP retransmits, the first two were for an IPv4
connection from 10.153.223.157 port 22 to 69.53.245.40 port 34619. The TCP
state was "ESTABLISHED" at the time of the retransmit. The on-CPU PID at the
time of the retransmit is printed, in this case 0 (the kernel, which will
be the case most of the time).
Retransmits are usually a sign of poor network health, and this tool is
useful for their investigation. Unlike using tcpdump, this tool has very
low overhead, as it only traces the retransmit function. It also prints
additional kernel details: the state of the TCP session at the time of the
retransmit.
A -l option will include TCP tail loss probe attempts:
# ./tcpretrans -l
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:45 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:46 0 4 10.153.223.157:22 R> 69.53.245.40:51601 ESTABLISHED
01:55:53 0 4 10.153.223.157:22 L> 69.53.245.40:46444 ESTABLISHED
01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:06 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 1938 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
01:56:08 0 4 10.153.223.157:22 R> 69.53.245.40:46444 ESTABLISHED
[...]
See the "L>" in the "T>" column. These are attempts: the kernel probably
sent a TLP, but in some cases it might not have been ultimately sent.
To spot heavily retransmitting flows quickly one can use the -c flag. It will
count occurring retransmits per flow.
# ./tcpretrans.py -c
Tracing retransmits ... Hit Ctrl-C to end
^C
LADDR:LPORT RADDR:RPORT RETRANSMITS
192.168.10.50:60366 <-> 172.217.21.194:443 700
192.168.10.50:666 <-> 172.213.11.195:443 345
192.168.10.50:366 <-> 172.212.22.194:443 211
[...]
This can ease to quickly isolate congested or otherwise awry network paths
responsible for clamping tcp performance.
USAGE message:
# ./tcpretrans -h
usage: tcpretrans [-h] [-l]
Trace TCP retransmits
optional arguments:
-h, --help show this help message and exit
-l, --lossprobe include tail loss probe attempts
-c, --count count occurred retransmits per flow
examples:
./tcpretrans # trace TCP retransmits
./tcpretrans -l # include TLP attempts