3.3. Red Hat Enterprise Linux-Specific Information
Monitoring bandwidth and CPU utilization under Red Hat Enterprise Linux entails using the tools discussed in Chapter 2, Resource Monitoring; therefore, if you have not yet read that chapter, you should do so before continuing.
3.3.1. Monitoring Bandwidth on Red Hat Enterprise Linux
As stated in Section 2.4.2, “Monitoring Bandwidth”, it is difficult to directly monitor bandwidth utilization. However, by examining device-level statistics, it is possible to roughly gauge whether insufficient bandwidth is an issue on your system.
By using
vmstat
, it is possible to determine if overall device activity is excessive by examining the bi
and bo
fields; in addition, taking note of the si
and so
fields give you a bit more insight into how much disk activity is due to swap-related I/O:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 248088 158636 480804 0 0 2 6 120 120 10 3 87 0
In this example, the
bi
field shows two blocks/second written to block devices (primarily disk drives), while the bo
field shows six blocks/second read from block devices. We can determine that none of this activity was due to swapping, as the si
and so
fields both show a swap-related I/O rate of zero kilobytes/second.
By using
iostat
, it is possible to gain a bit more insight into disk-related activity:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.34 4.60 2.83 87.24 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 1.10 6.21 25.08 961342 3881610 dev8-1 0.00 0.00 0.00 16 0
This output shows us that the device with major number 8 (which is
/dev/sda
, the first SCSI disk) averaged slightly more than one I/O operation per second (the tsp
field). Most of the I/O activity for this device were writes (the Blk_wrtn
field), with slightly more than 25 blocks written each second (the Blk_wrtn/s
field).
If more detail is required, use
iostat
's -x
option:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.37 4.54 2.81 87.27 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz /dev/sda 13.57 2.86 0.36 0.77 32.20 29.05 16.10 14.53 54.52 /dev/sda1 0.17 0.00 0.00 0.00 0.34 0.00 0.17 0.00 133.40 /dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.56 /dev/sda3 0.31 2.11 0.29 0.62 4.74 21.80 2.37 10.90 29.42 /dev/sda4 0.09 0.75 0.04 0.15 1.06 7.24 0.53 3.62 43.01
Over and above the longer lines containing more fields, the first thing to keep in mind is that this
iostat
output is now displaying statistics on a per-partition level. By using df
to associate mount points with device names, it is possible to use this report to determine if, for example, the partition containing /home/
is experiencing an excessive workload.
Actually, each line output from
iostat -x
is longer and contains more information than this; here is the remainder of each line (with the device column added for easier reading):
Device: avgqu-sz await svctm %util /dev/sda 0.24 20.86 3.80 0.43 /dev/sda1 0.00 141.18 122.73 0.03 /dev/sda2 0.00 6.00 6.00 0.00 /dev/sda3 0.12 12.84 2.68 0.24 /dev/sda4 0.11 57.47 8.94 0.17
In this example, it is interesting to note that
/dev/sda2
is the system swap partition; it is obvious from the many fields reading 0.00
for this partition that swapping is not a problem on this system.
Another interesting point to note is
/dev/sda1
. The statistics for this partition are unusual; the overall activity seems low, but why are the average I/O request size (the avgrq-sz
field), average wait time (the await
field), and the average service time (the svctm
field) so much larger than the other partitions? The answer is that this partition contains the /boot/
directory, which is where the kernel and initial ramdisk are stored. When the system boots, the read I/Os (notice that only the rsec/s
and rkB/s
fields are non-zero; no writing is done here on a regular basis) used during the boot process are for large numbers of blocks, resulting in the relatively long wait and service times iostat
displays.
It is possible to use
sar
for a longer-term overview of I/O statistics; for example, sar -b
displays a general I/O report:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM tps rtps wtps bread/s bwrtn/s 12:10:00 AM 0.51 0.01 0.50 0.25 14.32 12:20:01 AM 0.48 0.00 0.48 0.00 13.32 … 06:00:02 PM 1.24 0.00 1.24 0.01 36.23 Average: 1.11 0.31 0.80 68.14 34.79
Here, like
iostat
's initial display, the statistics are grouped for all block devices.
Another I/O-related report is produced using
sar -d
:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM DEV tps sect/s 12:10:00 AM dev8-0 0.51 14.57 12:10:00 AM dev8-1 0.00 0.00 12:20:01 AM dev8-0 0.48 13.32 12:20:01 AM dev8-1 0.00 0.00 … 06:00:02 PM dev8-0 1.24 36.25 06:00:02 PM dev8-1 0.00 0.00 Average: dev8-0 1.11 102.93 Average: dev8-1 0.00 0.00
This report provides per-device information, but with little detail.
While there are no explicit statistics showing bandwidth utilization for a given bus or datapath, we can at least determine what the devices are doing and use their activity to indirectly determine the bus loading.