Chapter 17. Detecting false sharing
False sharing occurs when a processor core on a Symmetric Multi Processing (SMP) system modifies data items on the same cache line that is in use by other processors to access other data items that are not being shared between the processors.
The initial modification requires that the other processors using the cache line invalidate their copy and request an updated one despite the processors not needing, or even necessarily having access to, an updated version of the modified data item.
You can use the perf c2c
command to detect false sharing.
17.1. The purpose of perf c2c Copy linkLink copied to clipboard!
The c2c
subcommand of the perf
tool enables Shared Data Cache-to-Cache (C2C) analysis. You can use the perf c2c
command to inspect cache-line contention to detect both true and false sharing.
Cache-line contention occurs when a processor core on a Symmetric Multi Processing (SMP) system modifies data items on the same cache line that is in use by other processors. All other processors using this cache-line must then invalidate their copy and request an updated one. This can lead to degraded performance.
The perf c2c
command provides the following information:
- Cache lines where contention has been detected
- Processes reading and writing the data
- Instructions causing the contention
- The Non-Uniform Memory Access (NUMA) nodes involved in the contention
17.2. Detecting cache-line contention with perf c2c Copy linkLink copied to clipboard!
You can use the perf c2c
command to detect cache-line contention in a system. The perf c2c
command supports the same options as perf record
as well as some options exclusive to the c2c
subcommand. The recorded data is stored in a perf.data
file in the current directory for later analysis.
Prerequisites
-
You have the
perf
user space tool installed. For more information, see Installing perf.
Procedure
Use
perf c2c
to detect cache-line contention:perf c2c record -a sleep <seconds>
# perf c2c record -a sleep <seconds>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command samples and records cache-line contention data across all CPU’s for a period of
<seconds>
as dictated by thesleep
command. You can replace thesleep
command with any command you want to collect cache-line contention data over.
17.3. Visualizing a perf.data file recorded with perf c2c record Copy linkLink copied to clipboard!
You can visualize the perf.data
file that is recorded with the perf c2c
command.
Prerequisites
-
You have the
perf
user space tool installed. For more information, see Installing perf. -
A
perf.data
file recorded by using theperf c2c
command is available in the current directory. For more information, see Detecting cache-line contention with perf c2c.
Procedure
Open the
perf.data
file for analysis:perf c2c report --stdio
# perf c2c report --stdio
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command visualizes the
perf.data
file into several graphs within the terminal:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
17.4. Interpretation of perf c2c report output Copy linkLink copied to clipboard!
The visualization displayed by running the perf c2c report --stdio
command sorts the data into several tables:
- Trace Events Information
-
Provides a high level summary of all the load and store samples that are collected by the
perf c2c record
command. - Global Shared Cache Line Event Information
- Provides statistics over the shared cache lines.
c2c
Details-
Provides information about what events were sampled and how the
perf c2c report
data is organized within the visualization. - Shared Data Cache Line Table
- Provides a one line summary for the hottest cache lines where false sharing is detected and is sorted in descending order by the amount of remote Hitm detected per cache line by default.
- Shared Cache Line Distribution Pareto
Provides a variety of information about each cache line experiencing contention:
-
The cache lines are numbered in the NUM column, starting at
0
. - The virtual address of each cache line is contained in the Data address Offset column and followed subsequently by the offset into the cache line where different accesses occurred.
- The Pid column contains the process ID.
- The Code Address column contains the instruction pointer code address.
- The columns under the cycles label show average load latencies.
- The cpu cnt column displays how many different CPUs samples came from. That is, how many different CPUs were waiting for the data indexed at that given location.
- The Symbol column displays the function name or symbol.
- The Shared Object column displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
- The Source:Line column displays the source file and line number.
- The Node{cpu list} column displays which specific CPUs samples came from for each node.
-
The cache lines are numbered in the NUM column, starting at
17.5. Detecting false sharing with perf c2c Copy linkLink copied to clipboard!
You can detect false sharing with the perf c2c
command.
Prerequisites
-
You have the
perf
user space tool installed. For more information, see Installing perf. -
A
perf.data
file recorded by using theperf c2c
command is available in the current directory. For more information, see Detecting cache-line contention with perf c2c.
Procedure
Open the
perf.data
file:perf c2c report --stdio
# perf c2c report --stdio
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This opens the
perf.data
file in the terminal.In the Trace Event Information table, locate the row containing the values for LLC Misses to Remote Cache (HITM):
The percentage in the value column of the LLC Misses to Remote Cache (HITM) row represents the percentage of LLC misses that were occurring across NUMA nodes in modified cache-lines and is a key indicator that false sharing has occurred.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inspect the Rmt column of the LLC Load Hitm field of the Shared Data Cache Line Table:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The table is sorted in descending order by the amount of remote Hitm detected per cache line. A high number in the Rmt column of the LLC Load Hitm section indicates false sharing and requires further inspection of the cache line on which it occurred to debug the false sharing activity.