Chapter 23. Monitoring application performance with perf
You can use the perf
tool to monitor and analyze application performance.
23.1. Attaching perf record to a running process
You can attach perf record
to a running process. This will instruct perf record
to only sample and record performance data in the specified processes.
Prerequisites
-
You have the
perf
user space tool installed as described in Installing perf.
Procedure
Attach
perf record
to a running process:$ perf record -p ID1,ID2 sleep seconds
The previous example samples and records performance data of the processes with the process ID’s
ID1
andID2
for a time period ofseconds
seconds as dictated by using thesleep
command. You can also configureperf
to record events in specific threads:$ perf record -t ID1,ID2 sleep seconds
NoteWhen using the
-t
flag and stipulating thread ID’s,perf
disables inheritance by default. You can enable inheritance by adding the--inherit
option.
23.2. Capturing call graph data with perf record
You can configure the perf record
tool so that it records which function is calling other functions in the performance profile. This helps to identify a bottleneck if several processes are calling the same function.
Prerequisites
-
You have the
perf
user space tool installed as described in Installing perf.
Procedure
Sample and record performance data with the
--call-graph
option:$ perf record --call-graph method command
-
Replace
command
with the command you want to sample data during. If you do not specify a command, thenperf record
will sample data until you manually stop it by pressing Ctrl+C. Replace method with one of the following unwinding methods:
fp
-
Uses the frame pointer method. Depending on compiler optimization, such as with binaries built with the GCC option
--fomit-frame-pointer
, this may not be able to unwind the stack. dwarf
- Uses DWARF Call Frame Information to unwind the stack.
lbr
- Uses the last branch record hardware on Intel processors.
-
Replace
Additional resources
-
perf-record(1)
man page on your system
23.3. Analyzing perf.data with perf report
You can use perf report
to display and analyze a perf.data
file.
Prerequisites
-
You have the
perf
user space tool installed as described in Installing perf. -
There is a
perf.data
file in the current directory. -
If the
perf.data
file was created with root access, you need to runperf report
with root access too.
Procedure
Display the contents of the
perf.data
file for further analysis:# perf report
This command displays output similar to the following:
Samples: 2K of event 'cycles', Event count (approx.): 235462960 Overhead Command Shared Object Symbol 2.36% kswapd0 [kernel.kallsyms] [k] page_vma_mapped_walk 2.13% sssd_kcm libc-2.28.so [.] memset_avx2_erms 2.13% perf [kernel.kallsyms] [k] smp_call_function_single 1.53% gnome-shell libc-2.28.so [.] strcmp_avx2 1.17% gnome-shell libglib-2.0.so.0.5600.4 [.] g_hash_table_lookup 0.93% Xorg libc-2.28.so [.] memmove_avx_unaligned_erms 0.89% gnome-shell libgobject-2.0.so.0.5600.4 [.] g_object_unref 0.87% kswapd0 [kernel.kallsyms] [k] page_referenced_one 0.86% gnome-shell libc-2.28.so [.] memmove_avx_unaligned_erms 0.83% Xorg [kernel.kallsyms] [k] alloc_vmap_area 0.63% gnome-shell libglib-2.0.so.0.5600.4 [.] g_slice_alloc 0.53% gnome-shell libgirepository-1.0.so.1.0.0 [.] g_base_info_unref 0.53% gnome-shell ld-2.28.so [.] _dl_find_dso_for_object 0.49% kswapd0 [kernel.kallsyms] [k] vma_interval_tree_iter_next 0.48% gnome-shell libpthread-2.28.so [.] pthread_getspecific 0.47% gnome-shell libgirepository-1.0.so.1.0.0 [.] 0x0000000000013b1d 0.45% gnome-shell libglib-2.0.so.0.5600.4 [.] g_slice_free1 0.45% gnome-shell libgobject-2.0.so.0.5600.4 [.] g_type_check_instance_is_fundamentally_a 0.44% gnome-shell libc-2.28.so [.] malloc 0.41% swapper [kernel.kallsyms] [k] apic_timer_interrupt 0.40% gnome-shell ld-2.28.so [.] _dl_lookup_symbol_x 0.39% kswapd0 [kernel.kallsyms] [k] raw_callee_save___pv_queued_spin_unlock
Additional resources
-
perf-report(1)
man page on your system