4.3. Profiling
The following sections showcase scripts that profile kernel activity by monitoring function calls.
4.3.1. Counting Function Calls Made
This section describes how to identify how many times the system called a specific kernel function in a 30-second sample. Depending on your use of wildcards, you can also use this script to target multiple kernel functions.
例 4.21. functioncallcount.stp
#! /usr/bin/env stap # The following line command will probe all the functions # in kernel's memory management code: # # stap functioncallcount.stp "*@mm/*.c" probe kernel.function(@1).call { # probe functions listed on commandline called[probefunc()] <<< 1 # add a count efficiently } global called probe end { foreach (fn in called-) # Sort by call count (in decreasing order) # (fn+ in called) # Sort by function name printf("%s %d\n", fn, @count(called[fn])) exit() }
例 4.21 “functioncallcount.stp” takes the targeted kernel function as an argument. The argument supports wildcards, which enables you to target multiple kernel functions up to a certain extent.
The output of 例 4.21 “functioncallcount.stp” contains the name of the function called and how many times it was called during the sample time (in alphabetical order). 例 4.22 “例 4.21 “functioncallcount.stp” Sample Output” contains an excerpt from the output of stap functioncallcount.stp "*@mm/*.c":
例 4.22. 例 4.21 “functioncallcount.stp” Sample Output
[...] __vma_link 97 __vma_link_file 66 __vma_link_list 97 __vma_link_rb 97 __xchg 103 add_page_to_active_list 102 add_page_to_inactive_list 19 add_to_page_cache 19 add_to_page_cache_lru 7 all_vm_events 6 alloc_pages_node 4630 alloc_slabmgmt 67 anon_vma_alloc 62 anon_vma_free 62 anon_vma_lock 66 anon_vma_prepare 98 anon_vma_unlink 97 anon_vma_unlock 66 arch_get_unmapped_area_topdown 94 arch_get_unmapped_exec_area 3 arch_unmap_area_topdown 97 atomic_add 2 atomic_add_negative 97 atomic_dec_and_test 5153 atomic_inc 470 atomic_inc_and_test 1 [...]
4.3.2. Call Graph Tracing
This section describes how to trace incoming and outgoing function calls.
例 4.23. para-callgraph.stp
#! /usr/bin/env stap function trace(entry_p, extra) { %( $# > 1 %? if (tid() in trace) %) printf("%s%s%s %s\n", thread_indent (entry_p), (entry_p>0?"->":"<-"), probefunc (), extra) } %( $# > 1 %? global trace probe $2.call { trace[tid()] = 1 } probe $2.return { delete trace[tid()] } %) probe $1.call { trace(1, $$parms) } probe $1.return { trace(-1, $$return) }
例 4.23 “para-callgraph.stp” takes two command-line arguments:
- The function(s) whose entry/exit you'd like to trace ($1).
- A second optional trigger function (
$2
), which enables or disables tracing on a per-thread basis. Tracing in each thread will continue as long as the trigger function has not exited yet.
例 4.23 “para-callgraph.stp” uses
thread_indent()
; as such, its output contains the timestamp, process name, and thread ID of $1
(the probe function you are tracing). For more information about thread_indent()
, see its entry in SystemTap Functions.
The following example contains an excerpt from the output for stap para-callgraph.stp 'kernel.function("*@fs/*.c")' 'kernel.function("sys_read")':
例 4.24. 例 4.23 “para-callgraph.stp” Sample Output
[...] 267 gnome-terminal(2921): <-do_sync_read return=0xfffffffffffffff5 269 gnome-terminal(2921):<-vfs_read return=0xfffffffffffffff5 0 gnome-terminal(2921):->fput file=0xffff880111eebbc0 2 gnome-terminal(2921):<-fput 0 gnome-terminal(2921):->fget_light fd=0x3 fput_needed=0xffff88010544df54 3 gnome-terminal(2921):<-fget_light return=0xffff8801116ce980 0 gnome-terminal(2921):->vfs_read file=0xffff8801116ce980 buf=0xc86504 count=0x1000 pos=0xffff88010544df48 4 gnome-terminal(2921): ->rw_verify_area read_write=0x0 file=0xffff8801116ce980 ppos=0xffff88010544df48 count=0x1000 7 gnome-terminal(2921): <-rw_verify_area return=0x1000 12 gnome-terminal(2921): ->do_sync_read filp=0xffff8801116ce980 buf=0xc86504 len=0x1000 ppos=0xffff88010544df48 15 gnome-terminal(2921): <-do_sync_read return=0xfffffffffffffff5 18 gnome-terminal(2921):<-vfs_read return=0xfffffffffffffff5 0 gnome-terminal(2921):->fput file=0xffff8801116ce980
4.3.3. Determining Time Spent in Kernel and User Space
This section illustrates how to determine the amount of time any given thread is spending in either kernel or user-space.
例 4.25. thread-times.stp
#! /usr/bin/env stap probe perf.sw.cpu_clock!, timer.profile { // NB: To avoid contention on SMP machines, no global scalars/arrays used, // only contention-free statistics aggregates. tid=tid(); e=execname() if (!user_mode()) kticks[e,tid] <<< 1 else uticks[e,tid] <<< 1 ticks <<< 1 tids[e,tid] <<< 1 } global uticks, kticks, ticks global tids probe timer.s(5), end { allticks = @count(ticks) printf ("%16s %5s %7s %7s (of %d ticks)\n", "comm", "tid", "%user", "%kernel", allticks) foreach ([e,tid] in tids- limit 20) { uscaled = @count(uticks[e,tid])*10000/allticks // SystemTap only performs integer arithmetic. // To avoid losing precision the decimal point is shifted // to the right four places (*100000). Think of this as // the original result value x.xxyy becoming xxxyy.0. // The integer percentage xxx is obtained // by dividing by 100 and the fractional percentage // is obtained with a modulo 100 operation. kscaled = @count(kticks[e,tid])*10000/allticks printf ("%16s %5d %3d.%02d%% %3d.%02d%%\n", e, tid, uscaled/100, uscaled%100, kscaled/100, kscaled%100) } printf("\n") delete uticks delete kticks delete ticks delete tids }
例 4.25 “thread-times.stp” lists the top 20 processes currently taking up CPU time within a 5-second sample, along with the total number of CPU ticks made during the sample. The output of this script also notes the percentage of CPU time each process used, as well as whether that time was spent in kernel space or user space.
例 4.26 “例 4.25 “thread-times.stp” Sample Output” contains a 5-second sample of the output for 例 4.25 “thread-times.stp”:
例 4.26. 例 4.25 “thread-times.stp” Sample Output
tid %user %kernel (of 20002 ticks) 0 0.00% 87.88% 32169 5.24% 0.03% 9815 3.33% 0.36% 9859 0.95% 0.00% 3611 0.56% 0.12% 9861 0.62% 0.01% 11106 0.37% 0.02% 32167 0.08% 0.08% 3897 0.01% 0.08% 3800 0.03% 0.00% 2886 0.02% 0.00% 3243 0.00% 0.01% 3862 0.01% 0.00% 3782 0.00% 0.00% 21767 0.00% 0.00% 2522 0.00% 0.00% 3883 0.00% 0.00% 3775 0.00% 0.00% 3943 0.00% 0.00% 3873 0.00% 0.00%
4.3.4. Monitoring Polling Applications
This section describes how to identify and monitor which applications are polling. Doing so allows you to track unnecessary or excessive polling, which can help you pinpoint areas for improvement in terms of CPU usage and power savings.
例 4.27. timeout.stp
#! /usr/bin/env stap # Copyright (C) 2009 Red Hat, Inc. # Written by Ulrich Drepper <drepper@redhat.com> # Modified by William Cohen <wcohen@redhat.com> global process, timeout_count, to global poll_timeout, epoll_timeout, select_timeout, itimer_timeout global nanosleep_timeout, futex_timeout, signal_timeout probe syscall.poll, syscall.epoll_wait { if (timeout) to[pid()]=timeout } probe syscall.poll.return { p = pid() if ($return == 0 && to[p] > 0 ) { poll_timeout[p]++ timeout_count[p]++ process[p] = execname() delete to[p] } } probe syscall.epoll_wait.return { p = pid() if ($return == 0 && to[p] > 0 ) { epoll_timeout[p]++ timeout_count[p]++ process[p] = execname() delete to[p] } } probe syscall.select.return { if ($return == 0) { p = pid() select_timeout[p]++ timeout_count[p]++ process[p] = execname() } } probe syscall.futex.return { if (errno_str($return) == "ETIMEDOUT") { p = pid() futex_timeout[p]++ timeout_count[p]++ process[p] = execname() } } probe syscall.nanosleep.return { if ($return == 0) { p = pid() nanosleep_timeout[p]++ timeout_count[p]++ process[p] = execname() } } probe kernel.function("it_real_fn") { p = pid() itimer_timeout[p]++ timeout_count[p]++ process[p] = execname() } probe syscall.rt_sigtimedwait.return { if (errno_str($return) == "EAGAIN") { p = pid() signal_timeout[p]++ timeout_count[p]++ process[p] = execname() } } probe syscall.exit { p = pid() if (p in process) { delete process[p] delete timeout_count[p] delete poll_timeout[p] delete epoll_timeout[p] delete select_timeout[p] delete itimer_timeout[p] delete futex_timeout[p] delete nanosleep_timeout[p] delete signal_timeout[p] } } probe timer.s(1) { ansi_clear_screen() printf (" pid | poll select epoll itimer futex nanosle signal| process\n") foreach (p in timeout_count- limit 20) { printf ("%5d |%7d %7d %7d %7d %7d %7d %7d| %-.38s\n", p, poll_timeout[p], select_timeout[p], epoll_timeout[p], itimer_timeout[p], futex_timeout[p], nanosleep_timeout[p], signal_timeout[p], process[p]) } }
例 4.27 “timeout.stp” tracks how many times each application used the following system calls over time:
poll
select
epoll
itimer
futex
nanosleep
signal
In some applications, these system calls are used excessively. As such, they are normally identified as "likely culprits" for polling applications. Note, however, that an application may be using a different system call to poll excessively; sometimes, it is useful to find out the top system calls used by the system (see 第 4.3.5 节 “Tracking Most Frequently Used System Calls” for instructions). Doing so can help you identify any additional suspects, which you can add to 例 4.27 “timeout.stp” for tracking.
例 4.28. 例 4.27 “timeout.stp” Sample Output
uid | poll select epoll itimer futex nanosle signal| process 28937 | 148793 0 0 4727 37288 0 0| firefox 22945 | 0 56949 0 1 0 0 0| scim-bridge 0 | 0 0 0 36414 0 0 0| swapper 4275 | 23140 0 0 1 0 0 0| mixer_applet2 4191 | 0 14405 0 0 0 0 0| scim-launcher 22941 | 7908 1 0 62 0 0 0| gnome-terminal 4261 | 0 0 0 2 0 7622 0| escd 3695 | 0 0 0 0 0 7622 0| gdm-binary 3483 | 0 7206 0 0 0 0 0| dhcdbd 4189 | 6916 0 0 2 0 0 0| scim-panel-gtk 1863 | 5767 0 0 0 0 0 0| iscsid 2562 | 0 2881 0 1 0 1438 0| pcscd 4257 | 4255 0 0 1 0 0 0| gnome-power-man 4278 | 3876 0 0 60 0 0 0| multiload-apple 4083 | 0 1331 0 1728 0 0 0| Xorg 3921 | 1603 0 0 0 0 0 0| gam_server 4248 | 1591 0 0 0 0 0 0| nm-applet 3165 | 0 1441 0 0 0 0 0| xterm 29548 | 0 1440 0 0 0 0 0| httpd 1862 | 0 0 0 0 0 1438 0| iscsid
You can increase the sample time by editing the timer in the second probe (
timer.s()
). The output of 例 4.21 “functioncallcount.stp” contains the name and UID of the top 20 polling applications, along with how many times each application performed each polling system call (over time). 例 4.28 “例 4.27 “timeout.stp” Sample Output” contains an excerpt of the script:
4.3.5. Tracking Most Frequently Used System Calls
例 4.27 “timeout.stp” from 第 4.3.4 节 “Monitoring Polling Applications” helps you identify which applications are polling by pointing out which ones used the following system calls most frequently:
poll
select
epoll
itimer
futex
nanosleep
signal
However, in some systems, a different system call might be responsible for excessive polling. If you suspect that a polling application is using a different system call to poll, you need to identify first the top system calls used by the system. To do this, use 例 4.29 “topsys.stp”.
例 4.29. topsys.stp
#! /usr/bin/env stap # # This script continuously lists the top 20 systemcalls in the interval # 5 seconds # global syscalls_count probe syscall.* { syscalls_count[name]++ } function print_systop () { printf ("%25s %10s\n", "SYSCALL", "COUNT") foreach (syscall in syscalls_count- limit 20) { printf("%25s %10d\n", syscall, syscalls_count[syscall]) } delete syscalls_count } probe timer.s(5) { print_systop () printf("--------------------------------------------------------------\n") }
例 4.29 “topsys.stp” lists the top 20 system calls used by the system per 5-second interval. It also lists how many times each system call was used during that period. Refer to 例 4.30 “例 4.29 “topsys.stp” Sample Output” for a sample output.
例 4.30. 例 4.29 “topsys.stp” Sample Output
-------------------------------------------------------------- SYSCALL COUNT gettimeofday 1857 read 1821 ioctl 1568 poll 1033 close 638 open 503 select 455 write 391 writev 335 futex 303 recvmsg 251 socket 137 clock_gettime 124 rt_sigprocmask 121 sendto 120 setitimer 106 stat 90 time 81 sigreturn 72 fstat 66 --------------------------------------------------------------
4.3.6. Tracking System Call Volume Per Process
This section illustrates how to determine which processes are performing the highest volume of system calls. In previous sections, we've described how to monitor the top system calls used by the system over time (第 4.3.5 节 “Tracking Most Frequently Used System Calls”). We've also described how to identify which applications use a specific set of "polling suspect" system calls the most (第 4.3.4 节 “Monitoring Polling Applications”). Monitoring the volume of system calls made by each process provides more data in investigating your system for polling processes and other resource hogs.
例 4.31. syscalls_by_proc.stp
#! /usr/bin/env stap # Copyright (C) 2006 IBM Corp. # # This file is part of systemtap, and is free software. You can # redistribute it and/or modify it under the terms of the GNU General # Public License (GPL); either version 2, or (at your option) any # later version. # # Print the system call count by process name in descending order. # global syscalls probe begin { print ("Collecting data... Type Ctrl-C to exit and display results\n") } probe syscall.* { syscalls[execname()]++ } probe end { printf ("%-10s %-s\n", "#SysCalls", "Process Name") foreach (proc in syscalls-) printf("%-10d %-s\n", syscalls[proc], proc) }
例 4.31 “syscalls_by_proc.stp” lists the top 20 processes performing the highest number of system calls. It also lists how many system calls each process performed during the time period. See 例 4.32 “例 4.29 “topsys.stp” Sample Output” for a sample output.
例 4.32. 例 4.29 “topsys.stp” Sample Output
Collecting data... Type Ctrl-C to exit and display results #SysCalls Process Name 1577 multiload-apple 692 synergyc 408 pcscd 376 mixer_applet2 299 gnome-terminal 293 Xorg 206 scim-panel-gtk 95 gnome-power-man 90 artsd 85 dhcdbd 84 scim-bridge 78 gnome-screensav 66 scim-launcher [...]
If you prefer the output to display the process IDs instead of the process names, use the following script instead.
例 4.33. syscalls_by_pid.stp
#! /usr/bin/env stap # Copyright (C) 2006 IBM Corp. # # This file is part of systemtap, and is free software. You can # redistribute it and/or modify it under the terms of the GNU General # Public License (GPL); either version 2, or (at your option) any # later version. # # Print the system call count by process ID in descending order. # global syscalls probe begin { print ("Collecting data... Type Ctrl-C to exit and display results\n") } probe syscall.* { syscalls[pid()]++ } probe end { printf ("%-10s %-s\n", "#SysCalls", "PID") foreach (pid in syscalls-) printf("%-10d %-d\n", syscalls[pid], pid) }
As indicated in the output, you need to manually exit the script in order to display the results. You can add a timed expiration to either script by simply adding a timer.s() probe; for example, to instruct the script to expire after 5 seconds, add the following probe to the script:
probe timer.s(5) { exit() }