Chapter 25. Profiling memory accesses with perf mem
You can use the perf mem
command to sample memory accesses on your system.
25.1. The purpose of perf mem Copy linkLink copied to clipboard!
The mem
subcommand of the perf
tool enables the sampling of memory accesses (loads and stores). The perf mem
command provides information about memory latency, types of memory accesses, functions causing cache hits and misses, and, by recording the data symbol, the memory locations where these hits and misses occur.
25.2. Sampling memory access with perf mem Copy linkLink copied to clipboard!
This procedure describes how to use the perf mem
command to sample memory accesses on your system. The command takes the same options as perf record
and perf report
as well as some options exclusive to the mem
subcommand. The recorded data is stored in a perf.data
file in the current directory for later analysis.
Prerequisites
-
You have the
perf
user space tool installed as described in Installing perf.
Procedure
Sample the memory accesses:
perf mem record -a sleep seconds
# perf mem record -a sleep seconds
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This example samples memory accesses across all CPUs for a period of seconds seconds as dictated by the
sleep
command. You can replace thesleep
command for any command during which you want to sample memory access data. By default,perf mem
samples both memory loads and stores. You can select only one memory operation by using the-t
option and specifying either "load" or "store" betweenperf mem
andrecord
. For loads, information over the memory hierarchy level, TLB memory accesses, bus snoops, and memory locks is captured.Open the
perf.data
file for analysis:perf mem report
# perf mem report
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you have used the example commands, the output is:
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
cpu/mem-loads,ldlat=30/P
line denotes data collected over memory loads and thecpu/mem-stores/P
line denotes data collected over memory stores. Highlight the category of interest and press Enter to view the data:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can sort your results to investigate different aspects of interest when displaying the data. For example, to sort data over memory loads by type of memory accesses occurring during the sampling period in descending order of overhead they account for:
perf mem -t load report --sort=mem
# perf mem -t load report --sort=mem
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, the output can be:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
25.3. Interpretation of perf mem report output Copy linkLink copied to clipboard!
The table displayed by running the perf mem report
command without any modifiers sorts the data into several columns:
- The 'Overhead' column
- Indicates percentage of overall samples collected in that particular function.
- The 'Samples' column
- Displays the number of samples accounted for by that row.
- The 'Local Weight' column
- Displays the access latency in processor core cycles.
- The 'Memory Access' column
- Displays the type of memory access that occurred.
- The 'Symbol' column
- Displays the function name or symbol.
- The 'Shared Object' column
- Displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
- The 'Data Symbol' column
- Displays the address of the memory location that row was targeting.
Oftentimes, due to dynamic allocation of memory or stack memory being accessed, the 'Data Symbol' column will display a raw address.
- The "Snoop" column
- Displays bus transactions.
- The 'TLB Access' column
- Displays TLB memory accesses.
- The 'Locked' column
- Indicates if a function was or was not memory locked.
In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.