Chapter 34. Analyzing application performance
Perf is a performance analysis tool. It provides a simple command-line interface and abstracts the CPU hardware difference in Linux performance measurements. Perf is based on the perf_events interface exported by the kernel.
One advantage of perf is that it is both kernel and architecture neutral. The analysis data can be reviewed without requiring a specific system configuration.
Prerequisites
-
The
perfpackage must be installed on the system. - You have administrator privileges.
34.1. Collecting system-wide statistics Copy linkLink copied to clipboard!
The perf record command is used for collecting system-wide statistics. It can be used in all processors.
Procedure
Collect system-wide performance statistics.
perf record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.725 MB perf.data (~31655 samples) ]
# perf record -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.725 MB perf.data (~31655 samples) ]Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example, all CPUs are denoted with the
-aoption, and the process was terminated after a few seconds. The results show that it collected 0.725 MB of data and stored it to a newly-createdperf.datafile.
Verification
Ensure that the results file was created.
ls perf.data
# ls perf.dataCopy to Clipboard Copied! Toggle word wrap Toggle overflow
34.2. Archiving performance analysis results Copy linkLink copied to clipboard!
You can analyze the results of the perf on other systems using the perf archive command. This may not be necessary, if:
-
Dynamic Shared Objects (DSOs), such as binaries and libraries, are already present in the analysis system, such as the
~/.debug/cache. - Both systems have the same set of binaries.
Procedure
Create an archive of the results from the
perfcommand.perf archive
# perf archiveCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a tarball from the archive.
tar cvf perf.data.tar.bz2 -C ~/.debug
# tar cvf perf.data.tar.bz2 -C ~/.debugCopy to Clipboard Copied! Toggle word wrap Toggle overflow
34.3. Analyzing performance analysis results Copy linkLink copied to clipboard!
The data from the perf record feature can now be investigated directly using the perf report command.
Procedure
Analyze the results directly from the
perf.datafile or from an archived tarball.perf report
# perf reportCopy to Clipboard Copied! Toggle word wrap Toggle overflow The output of the report is sorted according to the maximum CPU usage in percentage by the application. It shows if the sample has occurred in the kernel or user space of the process.
The report shows information about the module from which the sample was taken:
-
A kernel sample that did not take place in a kernel module is marked with the notation
[kernel.kallsyms]. -
A kernel sample that took place in the kernel module is marked as
[module],[ext4]. For a process in user space, the results might show the shared library linked with the process.
The report denotes whether the process also occurs in kernel or user space.
-
The result
[.]indicates user space. -
The result
[k]indicates kernel space.
Finer grained details are available for review, including data appropriate for experienced
perfdevelopers.-
A kernel sample that did not take place in a kernel module is marked with the notation
34.4. Listing pre-defined events Copy linkLink copied to clipboard!
There are a range of available options to get the hardware tracepoint activity.
Procedure
List pre-defined hardware and software events:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
34.5. Getting statistics about specified events Copy linkLink copied to clipboard!
You can view specific events using the perf stat command.
Procedure
View the number of context switches with the
perf statfeature:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The results show that in 5 seconds, 15619 context switches took place.
View file system activity by running a script. The following shows an example script:
for i in {1..100}; do touch /tmp/$i; sleep 1; done# for i in {1..100}; do touch /tmp/$i; sleep 1; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow In another terminal run the
perf statcommand:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The results show that in 5 seconds the script asked to create 5 files, indicating that there are 5
inoderequests.