Chapter 16. Profiling memory accesses with perf mem
You can use the perf mem command to sample memory accesses on your system. The mem subcommand of the perf tool enables the sampling of memory accesses (loads and stores). The perf mem command provides information about memory latency, types of memory accesses, functions causing cache hits and misses, and, by recording the data symbol, the memory locations where these hits and misses occur.
16.1. Sampling memory access with perf mem Copy linkLink copied to clipboard!
You can use the perf mem command to sample memory accesses on your system. The command takes the same options as perf record and perf report as well as some options exclusive to the mem sub-command. The recorded data is stored in a perf.data file in the current directory for later analysis.
Prerequisites
-
You have the
perfuser space tool installed. For more information, see Installing perf.
Procedure
Sample the memory accesses:
perf mem record -a sleep <seconds>
# perf mem record -a sleep <seconds>Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command samples memory accesses across all CPUs for a period of
<seconds>seconds as dictated by thesleepcommand. You can replace thesleepcommand for any command during which you want to sample memory access data. By default,perf memsamples both memory loads and stores. You can select only one memory operation by using the-toption and specifying eitherloadorstorebetweenperf memandrecord. For loads, information over the memory hierarchy level, TLB memory accesses, bus snoops, and memory locks is captured.Open the
perf.datafile for analysis:perf mem report
# perf mem reportCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you have used the example commands, the output is:
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/PCopy to Clipboard Copied! Toggle word wrap Toggle overflow The
cpu/mem-loads,ldlat=30/Pline denotes data collected over memory loads and thecpu/mem-stores/Pline denotes data collected over memory stores. Highlight the category of interest and press Enter to view the data:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Sort your results to investigate different aspects of interest when displaying the data. For example, to sort data over memory loads by type of memory accesses occurring during the sampling period in descending order of overhead they account for:
perf mem -t load report --sort=mem
# perf mem -t load report --sort=memCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example, the output can be:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
16.2. Interpretation of perf mem report output Copy linkLink copied to clipboard!
When you run the perf mem report command without any modifiers sorts the data into several columns:
- Overhead
- Indicates percentage of overall samples collected in that particular function.
- Samples
- Displays the number of samples accounted for by that row.
- Local Weight
- Displays the access latency in processor core cycles.
- Memory Access
- Displays the type of memory access that occurred.
- Symbol
- Displays the function name or symbol.
- Shared Object
-
Displays the name of the ELF image where the samples come from (the name
[kernel.kallsyms]is used when the samples come from the kernel). - Data Symbol
Displays the address of the memory location that row was targeting.
ImportantThe
Data Symbolcolumn might display a raw address due to dynamic allocation of memory or stack memory being accessed.- Snoop
- Displays bus transactions.
- TLB Access
- Displays TLB memory accesses.
- Locked
- Indicates if a function was or was not memory locked. In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.