Chapter 16. Profiling memory accesses with perf mem
You can use the perf mem
command to sample memory accesses on your system.
The mem
subcommand of the perf
tool enables the sampling of memory accesses (loads and stores). The perf mem
command provides information about memory latency, types of memory accesses, functions causing cache hits and misses, and, by recording the data symbol, the memory locations where these hits and misses occur.
16.1. Sampling memory access with perf mem Copy linkLink copied to clipboard!
You can use the perf mem
command to sample memory accesses on your system. The command takes the same options as perf record
and perf report
as well as some options exclusive to the mem
sub-command. The recorded data is stored in a perf.data
file in the current directory for later analysis.
Prerequisites
-
You have the
perf
user space tool installed. For more information, see Installing perf.
Procedure
Sample the memory accesses:
perf mem record -a sleep <seconds>
# perf mem record -a sleep <seconds>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command samples memory accesses across all CPUs for a period of
<seconds>
seconds as dictated by thesleep
command. You can replace thesleep
command for any command during which you want to sample memory access data. By default,perf mem
samples both memory loads and stores. You can select only one memory operation by using the-t
option and specifying eitherload
orstore
betweenperf mem
andrecord
. For loads, information over the memory hierarchy level, TLB memory accesses, bus snoops, and memory locks is captured.Open the
perf.data
file for analysis:perf mem report
# perf mem report
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you have used the example commands, the output is:
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
cpu/mem-loads,ldlat=30/P
line denotes data collected over memory loads and thecpu/mem-stores/P
line denotes data collected over memory stores. Highlight the category of interest and press Enter to view the data:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Sort your results to investigate different aspects of interest when displaying the data. For example, to sort data over memory loads by type of memory accesses occurring during the sampling period in descending order of overhead they account for:
perf mem -t load report --sort=mem
# perf mem -t load report --sort=mem
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, the output can be:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
16.2. Interpretation of perf mem report output Copy linkLink copied to clipboard!
When you run the perf mem report
command without any modifiers sorts the data into several columns:
- Overhead
- Indicates percentage of overall samples collected in that particular function.
- Samples
- Displays the number of samples accounted for by that row.
- Local Weight
- Displays the access latency in processor core cycles.
- Memory Access
- Displays the type of memory access that occurred.
- Symbol
- Displays the function name or symbol.
- Shared Object
-
Displays the name of the ELF image where the samples come from (the name
[kernel.kallsyms]
is used when the samples come from the kernel). - Data Symbol
Displays the address of the memory location that row was targeting.
ImportantThe
Data Symbol
column might display a raw address due to dynamic allocation of memory or stack memory being accessed.- Snoop
- Displays bus transactions.
- TLB Access
- Displays TLB memory accesses.
- Locked
- Indicates if a function was or was not memory locked. In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.