Chapter 31. Analyzing application performance
Perf
is a performance analysis tool. It provides a simple command-line interface and abstracts the CPU hardware difference in Linux performance measurements. Perf
is based on the perf_events
interface exported by the kernel.
One advantage of perf
is that it is both kernel and architecture neutral. The analysis data can be reviewed without requiring a specific system configuration.
Prerequisites
-
The
perf
package must be installed on the system. - You have administrator privileges.
31.1. Collecting system-wide statistics
The perf record
command is used for collecting system-wide statistics. It can be used in all processors.
Procedure
Collect system-wide performance statistics.
# perf record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.725 MB perf.data (~31655 samples) ]
In this example, all CPUs are denoted with the
-a
option, and the process was terminated after a few seconds. The results show that it collected 0.725 MB of data and stored it to a newly-createdperf.data
file.
Verification
Ensure that the results file was created.
# ls perf.data
31.2. Archiving performance analysis results
You can analyze the results of the perf
on other systems using the perf archive
command. This may not be necessary, if:
-
Dynamic Shared Objects (DSOs), such as binaries and libraries, are already present in the analysis system, such as the
~/.debug/
cache. - Both systems have the same set of binaries.
Procedure
Create an archive of the results from the
perf
command.# perf archive
Create a tarball from the archive.
# tar cvf perf.data.tar.bz2 -C ~/.debug
31.3. Analyzing performance analysis results
The data from the perf record
feature can now be investigated directly using the perf report
command.
Procedure
Analyze the results directly from the
perf.data
file or from an archived tarball.# perf report
The output of the report is sorted according to the maximum CPU usage in percentage by the application. It shows if the sample has occurred in the kernel or user space of the process.
The report shows information about the module from which the sample was taken:
-
A kernel sample that did not take place in a kernel module is marked with the notation
[kernel.kallsyms]
. -
A kernel sample that took place in the kernel module is marked as
[module]
,[ext4]
. For a process in user space, the results might show the shared library linked with the process.
The report denotes whether the process also occurs in kernel or user space.
-
The result
[.]
indicates user space. -
The result
[k]
indicates kernel space.
Finer grained details are available for review, including data appropriate for experienced
perf
developers.-
A kernel sample that did not take place in a kernel module is marked with the notation
31.4. Listing pre-defined events
There are a range of available options to get the hardware tracepoint activity.
Procedure
List pre-defined hardware and software events:
# perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] alignment-faults [Software event] emulation-faults [Software event] ...[output truncated]...
31.5. Getting statistics about specified events
You can view specific events using the perf stat
command.
Procedure
View the number of context switches with the
perf stat
feature:# perf stat -e context-switches -a sleep 5 ^Performance counter stats for 'sleep 5': 15,619 context-switches 5.002060064 seconds time elapsed
The results show that in 5 seconds, 15619 context switches took place.
View file system activity by running a script. The following shows an example script:
# for i in {1..100}; do touch /tmp/$i; sleep 1; done
In another terminal run the
perf stat
command:# perf stat -e ext4:ext4_request_inode -a sleep 5 Performance counter stats for 'sleep 5': 5 ext4:ext4_request_inode 5.002253620 seconds time elapsed
The results show that in 5 seconds the script asked to create 5 files, indicating that there are 5
inode
requests.
31.6. Additional resources
-
perf help COMMAND
-
perf
(1) man page on your system