Chapter 7. Identifying application read socket buffer bottlenecks
If TCP applications do not clear the read socket buffers frequently enough, performance can suffer and packets can be lost. Red Hat Enterprise Linux provides different utilities to identify such problems.
7.1. Identifying receive buffer collapsing and pruning
When the data in the receive queue exceeds the receive buffer size, the TCP stack tries to free some space by removing unnecessary metadata from the socket buffer. This step is known as collapsing.
If collapsing fails to free sufficient space for additional traffic, the kernel prunes new data that arrives. This means that the kernel removes the data from the memory and the packet is lost.
To avoid collapsing and pruning operations, monitor whether TCP buffer collapsing and pruning happens on your server and, in this case, tune the TCP buffers.
Procedure
Use the
nstat
utility to query theTcpExtTCPRcvCollapsed
andTcpExtRcvPruned
counters:nstat -az TcpExtTCPRcvCollapsed TcpExtRcvPruned
# nstat -az TcpExtTCPRcvCollapsed TcpExtRcvPruned #kernel TcpExtRcvPruned 0 0.0 TcpExtTCPRcvCollapsed 612859 0.0
Copy to Clipboard Copied! Wait some time and re-run the
nstat
command:nstat -az TcpExtTCPRcvCollapsed TcpExtRcvPruned
# nstat -az TcpExtTCPRcvCollapsed TcpExtRcvPruned #kernel TcpExtRcvPruned 0 0.0 TcpExtTCPRcvCollapsed 620358 0.0
Copy to Clipboard Copied! If the values of the counters have increased compared to the first run, tuning is required:
-
If the application uses the
setsockopt(SO_RCVBUF)
call, consider removing it. With this call, the application only uses the receive buffer size specified in the call and turns off the socket’s ability to auto-tune its size. -
If the application does not use the
setsockopt(SO_RCVBUF)
call, tune the default and maximum values of the TCP read socket buffer.
-
If the application uses the
Display the receive backlog queue (
Recv-Q
):ss -nti
# ss -nti State Recv-Q Send-Q Local Address:Port Peer Address:Port Process ESTAB 0 0 192.0.2.1:443 192.0.2.125:41574 :7,7 ... lastrcv:543 ... ESTAB 78 0 192.0.2.1:443 192.0.2.56:42612 :7,7 ... lastrcv:658 ... ESTAB 88 0 192.0.2.1:443 192.0.2.97:40313 :7,7 ... lastrcv:5764 ... ...
Copy to Clipboard Copied! Run the
ss -nt
command multiple times with a few seconds waiting time between each run.If the output lists only one case of a high value in the
Recv-Q
column, the application was between two receive operations. However, if the values inRecv-Q
stays constant whilelastrcv
continually grows, orRecv-Q
continually increases over time, one of the following problems can be the cause:- The application does not check its socket buffers often enough. Contact the application vendor for details about how you can solve this problem.
The application does not get enough CPU time. To further debug this problem:
Display on which CPU cores the application runs:
ps -eo pid,tid,psr,pcpu,stat,wchan:20,comm
# ps -eo pid,tid,psr,pcpu,stat,wchan:20,comm PID TID PSR %CPU STAT WCHAN COMMAND ... 44594 44594 5 0.0 Ss do_select httpd 44595 44595 3 0.0 S skb_wait_for_more_pa httpd 44596 44596 5 0.0 Sl pipe_read httpd 44597 44597 5 0.0 Sl pipe_read httpd 44602 44602 5 0.0 Sl pipe_read httpd ...
Copy to Clipboard Copied! The
PSR
column displays the CPU cores the process is currently assigned to.- Identify other processes running on the same cores and consider assigning them to other cores.