Questo contenuto non è disponibile nella lingua selezionata.
Chapter 25. Profiling memory accesses with perf mem
You can use the perf mem
command to sample memory accesses on your system.
25.1. The purpose of perf mem
The mem
subcommand of the perf
tool enables the sampling of memory accesses (loads and stores). The perf mem
command provides information about memory latency, types of memory accesses, functions causing cache hits and misses, and, by recording the data symbol, the memory locations where these hits and misses occur.
25.2. Sampling memory access with perf mem
This procedure describes how to use the perf mem
command to sample memory accesses on your system. The command takes the same options as perf record
and perf report
as well as some options exclusive to the mem
subcommand. The recorded data is stored in a perf.data
file in the current directory for later analysis.
Prerequisites
-
You have the
perf
user space tool installed as described in Installing perf.
Procedure
Sample the memory accesses:
# perf mem record -a sleep seconds
This example samples memory accesses across all CPUs for a period of seconds seconds as dictated by the
sleep
command. You can replace thesleep
command for any command during which you want to sample memory access data. By default,perf mem
samples both memory loads and stores. You can select only one memory operation by using the-t
option and specifying either "load" or "store" betweenperf mem
andrecord
. For loads, information over the memory hierarchy level, TLB memory accesses, bus snoops, and memory locks is captured.Open the
perf.data
file for analysis:# perf mem report
If you have used the example commands, the output is:
Available samples 35k cpu/mem-loads,ldlat=30/P 54k cpu/mem-stores/P
The
cpu/mem-loads,ldlat=30/P
line denotes data collected over memory loads and thecpu/mem-stores/P
line denotes data collected over memory stores. Highlight the category of interest and press Enter to view the data:Samples: 35K of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 4067062 Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked 0.07% 29 98 L1 or L1 hit [.] 0x000000000000a255 libspeexdsp.so.1.5.0 [.] 0x00007f697a3cd0f0 anon None L1 or L2 hit No 0.06% 26 97 L1 or L1 hit [.] 0x000000000000a255 libspeexdsp.so.1.5.0 [.] 0x00007f697a3cd0f0 anon None L1 or L2 hit No 0.06% 25 96 L1 or L1 hit [.] 0x000000000000a255 libspeexdsp.so.1.5.0 [.] 0x00007f697a3cd0f0 anon None L1 or L2 hit No 0.06% 1 2325 Uncached or N/A hit [k] pci_azx_readl [kernel.kallsyms] [k] 0xffffb092c06e9084 [kernel.kallsyms] None L1 or L2 hit No 0.06% 1 2247 Uncached or N/A hit [k] pci_azx_readl [kernel.kallsyms] [k] 0xffffb092c06e8164 [kernel.kallsyms] None L1 or L2 hit No 0.05% 1 2166 L1 or L1 hit [.] 0x00000000038140d6 libxul.so [.] 0x00007ffd7b84b4a8 [stack] None L1 or L2 hit No 0.05% 1 2117 Uncached or N/A hit [k] check_for_unclaimed_mmio [kernel.kallsyms] [k] 0xffffb092c1842300 [kernel.kallsyms] None L1 or L2 hit No 0.05% 22 95 L1 or L1 hit [.] 0x000000000000a255 libspeexdsp.so.1.5.0 [.] 0x00007f697a3cd0f0 anon None L1 or L2 hit No 0.05% 1 1898 L1 or L1 hit [.] 0x0000000002a30e07 libxul.so [.] 0x00007f610422e0e0 anon None L1 or L2 hit No 0.05% 1 1878 Uncached or N/A hit [k] pci_azx_readl [kernel.kallsyms] [k] 0xffffb092c06e8164 [kernel.kallsyms] None L2 miss No 0.04% 18 94 L1 or L1 hit [.] 0x000000000000a255 libspeexdsp.so.1.5.0 [.] 0x00007f697a3cd0f0 anon None L1 or L2 hit No 0.04% 1 1593 Local RAM or RAM hit [.] 0x00000000026f907d libxul.so [.] 0x00007f3336d50a80 anon Hit L2 miss No 0.03% 1 1399 L1 or L1 hit [.] 0x00000000037cb5f1 libxul.so [.] 0x00007fbe81ef5d78 libxul.so None L1 or L2 hit No 0.03% 1 1229 LFB or LFB hit [.] 0x0000000002962aad libxul.so [.] 0x00007fb6f1be2b28 anon None L2 miss No 0.03% 1 1202 LFB or LFB hit [.] __pthread_mutex_lock libpthread-2.29.so [.] 0x00007fb75583ef20 anon None L1 or L2 hit No 0.03% 1 1193 Uncached or N/A hit [k] pci_azx_readl [kernel.kallsyms] [k] 0xffffb092c06e9164 [kernel.kallsyms] None L2 miss No 0.03% 1 1191 L1 or L1 hit [k] azx_get_delay_from_lpib [kernel.kallsyms] [k] 0xffffb092ca7efcf0 [kernel.kallsyms] None L1 or L2 hit No
Alternatively, you can sort your results to investigate different aspects of interest when displaying the data. For example, to sort data over memory loads by type of memory accesses occurring during the sampling period in descending order of overhead they account for:
# perf mem -t load report --sort=mem
For example, the output can be:
Samples: 35K of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 40670 Overhead Samples Memory access 31.53% 9725 LFB or LFB hit 29.70% 12201 L1 or L1 hit 23.03% 9725 L3 or L3 hit 12.91% 2316 Local RAM or RAM hit 2.37% 743 L2 or L2 hit 0.34% 9 Uncached or N/A hit 0.10% 69 I/O or N/A hit 0.02% 825 L3 miss
Additional resources
-
perf-mem(1)
man page on your system
25.3. Interpretation of perf mem report output
The table displayed by running the perf mem report
command without any modifiers sorts the data into several columns:
- The 'Overhead' column
- Indicates percentage of overall samples collected in that particular function.
- The 'Samples' column
- Displays the number of samples accounted for by that row.
- The 'Local Weight' column
- Displays the access latency in processor core cycles.
- The 'Memory Access' column
- Displays the type of memory access that occurred.
- The 'Symbol' column
- Displays the function name or symbol.
- The 'Shared Object' column
- Displays the name of the ELF image where the samples come from (the name [kernel.kallsyms] is used when the samples come from the kernel).
- The 'Data Symbol' column
- Displays the address of the memory location that row was targeting.
Oftentimes, due to dynamic allocation of memory or stack memory being accessed, the 'Data Symbol' column will display a raw address.
- The "Snoop" column
- Displays bus transactions.
- The 'TLB Access' column
- Displays TLB memory accesses.
- The 'Locked' column
- Indicates if a function was or was not memory locked.
In default mode, the functions are sorted in descending order with those with the highest overhead displayed first.