Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

5.4.2. Using Perf

Using the basic PCL infrastructure for collecting statistics or samples of program execution is relatively straightforward. This section provides simple examples of overall statistics and sampling.

To collect statistics on make and its children, use the following command:

perf stat -- make all

# perf stat -- make all

Copy to Clipboard

Toggle word wrap

The perf command collects a number of different hardware and software counters. It then prints the following information:

Performance counter stats for 'make all':

  244011.782059  task-clock-msecs         #      0.925 CPUs 
          53328  context-switches         #      0.000 M/sec
            515  CPU-migrations           #      0.000 M/sec
        1843121  page-faults              #      0.008 M/sec
   789702529782  cycles                   #   3236.330 M/sec
  1050912611378  instructions             #      1.331 IPC  
   275538938708  branches                 #   1129.203 M/sec
     2888756216  branch-misses            #      1.048 %    
     4343060367  cache-references         #     17.799 M/sec
      428257037  cache-misses             #      1.755 M/sec

  263.779192511  seconds time elapsed

Performance counter stats for 'make all':

  244011.782059  task-clock-msecs         #      0.925 CPUs 
          53328  context-switches         #      0.000 M/sec
            515  CPU-migrations           #      0.000 M/sec
        1843121  page-faults              #      0.008 M/sec
   789702529782  cycles                   #   3236.330 M/sec
  1050912611378  instructions             #      1.331 IPC  
   275538938708  branches                 #   1129.203 M/sec
     2888756216  branch-misses            #      1.048 %    
     4343060367  cache-references         #     17.799 M/sec
      428257037  cache-misses             #      1.755 M/sec

  263.779192511  seconds time elapsed

Copy to Clipboard

Toggle word wrap

The perf tool can also record samples. For example, to record data on the make command and its children, use:

perf record -- make all

# perf record -- make all

Copy to Clipboard

Toggle word wrap

This prints out the file in which the samples are stored, along with the number of samples collected:

[ perf record: Woken up 42 times to write data ]
[ perf record: Captured and wrote 9.753 MB perf.data (~426109 samples) ]

[ perf record: Woken up 42 times to write data ]
[ perf record: Captured and wrote 9.753 MB perf.data (~426109 samples) ]

Copy to Clipboard

Toggle word wrap

As of Red Hat Enterprise Linux 6.4, a new functionality to the {} group syntax has been added that allows the creation of event groups based on the way they are specified on the command line.

The current --group or -g options remain the same; if it is specified for record, stat, or top command, all the specified events become members of a single group with the first event as a group leader.

The new {} group syntax allows the creation of a group like:

perf record -e '{cycles, faults}' ls

# perf record -e '{cycles, faults}' ls

Copy to Clipboard

Toggle word wrap

The above results in a single event group containing cycles and faults events, with the cycles event as the group leader.

All groups are created with regards to threads and CPUs. As such, recording an event group within two threads on a server with four CPUs will create eight separate groups.

It is possible to use a standard event modifier for a group. This spans over all events in the group and updates each event modifier settings.

perf record -r '{faults:k,cache-references}:p'

# perf record -r '{faults:k,cache-references}:p'

Copy to Clipboard

Toggle word wrap

The above command results in the :kp modifier being used for faults, and the :p modifier being used for the cache-references event.

Performance Counters for Linux (PCL) Tools conflict with OProfile

Both OProfile and Performance Counters for Linux (PCL) use the same hardware Performance Monitoring Unit (PMU). If OProfile is currently running while attempting to use the PCL perf command, an error message like the following occurs when starting OProfile:

Error: open_counter returned with 16 (Device or resource busy). /bin/dmesg may provide additional information.

Fatal: Not all events could be opened.

Error: open_counter returned with 16 (Device or resource busy). /bin/dmesg may provide additional information.

Fatal: Not all events could be opened.

Copy to Clipboard

Toggle word wrap

To use the perf command, first shut down OProfile:

opcontrol --deinit

# opcontrol --deinit

Copy to Clipboard

Toggle word wrap

You can then analyze perf.data to determine the relative frequency of samples. The report output includes the command, object, and function for the samples. Use perf report to output an analysis of perf.data. For example, the following command produces a report of the executable that consumes the most time:

perf report --sort=comm

# perf report --sort=comm

Copy to Clipboard

Toggle word wrap

The resulting output:

# Samples: 1083783860000
#
# Overhead          Command
# ........  ...............
#
    48.19%         xsltproc
    44.48%        pdfxmltex
     6.01%             make
     0.95%             perl
     0.17%       kernel-doc
     0.05%          xmllint
     0.05%              cc1
     0.03%               cp
     0.01%            xmlto
     0.01%               sh
     0.01%          docproc
     0.01%               ld
     0.01%              gcc
     0.00%               rm
     0.00%              sed
     0.00%   git-diff-files
     0.00%             bash
     0.00%   git-diff-index

# Samples: 1083783860000
#
# Overhead          Command
# ........  ...............
#
    48.19%         xsltproc
    44.48%        pdfxmltex
     6.01%             make
     0.95%             perl
     0.17%       kernel-doc
     0.05%          xmllint
     0.05%              cc1
     0.03%               cp
     0.01%            xmlto
     0.01%               sh
     0.01%          docproc
     0.01%               ld
     0.01%              gcc
     0.00%               rm
     0.00%              sed
     0.00%   git-diff-files
     0.00%             bash
     0.00%   git-diff-index

Copy to Clipboard

Toggle word wrap

The column on the left shows the relative frequency of the samples. This output shows that make spends most of this time in xsltproc and the pdfxmltex. To reduce the time for the make to complete, focus on xsltproc and pdfxmltex. To list the functions executed by xsltproc, run:

perf report -n --comm=xsltproc

# perf report -n --comm=xsltproc

Copy to Clipboard

Toggle word wrap

This generates:

comm: xsltproc
# Samples: 472520675377
#
# Overhead  Samples                    Shared Object  Symbol
# ........ ..........  .............................  ......
#
    45.54%215179861044  libxml2.so.2.7.6               [.] xmlXPathCmpNodesExt
    11.63%54959620202  libxml2.so.2.7.6               [.] xmlXPathNodeSetAdd__internal_alias
     8.60%40634845107  libxml2.so.2.7.6               [.] xmlXPathCompOpEval
     4.63%21864091080  libxml2.so.2.7.6               [.] xmlXPathReleaseObject
     2.73%12919672281  libxml2.so.2.7.6               [.] xmlXPathNodeSetSort__internal_alias
     2.60%12271959697  libxml2.so.2.7.6               [.] valuePop
     2.41%11379910918  libxml2.so.2.7.6               [.] xmlXPathIsNaN__internal_alias
     2.19%10340901937  libxml2.so.2.7.6               [.] valuePush__internal_alias

comm: xsltproc
# Samples: 472520675377
#
# Overhead  Samples                    Shared Object  Symbol
# ........ ..........  .............................  ......
#
    45.54%215179861044  libxml2.so.2.7.6               [.] xmlXPathCmpNodesExt
    11.63%54959620202  libxml2.so.2.7.6               [.] xmlXPathNodeSetAdd__internal_alias
     8.60%40634845107  libxml2.so.2.7.6               [.] xmlXPathCompOpEval
     4.63%21864091080  libxml2.so.2.7.6               [.] xmlXPathReleaseObject
     2.73%12919672281  libxml2.so.2.7.6               [.] xmlXPathNodeSetSort__internal_alias
     2.60%12271959697  libxml2.so.2.7.6               [.] valuePop
     2.41%11379910918  libxml2.so.2.7.6               [.] xmlXPathIsNaN__internal_alias
     2.19%10340901937  libxml2.so.2.7.6               [.] valuePush__internal_alias

Copy to Clipboard

Toggle word wrap

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

5.4.2. Using Perf

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links