1.190. sosreport
1.190.1. RHBA-2010:0201: bug fix and enhancement update
- when
kmod-gfs2
was installed on a Red Hat Enterprise Linux 5 system, it was possible to have a situation whereby its version ofgfs2.ko
would take precedence over that supplied with the kernel. As a consequence, the wrong version of the Global File System would be used and it also would be incorrectly set to weak-update. SOS has now been modified to warn system administrators ifgfs2.ko
has been set to use weak updates and instructs them there is a need to remove kmod-gfs2 and reboot the system before proceeding any further. (BZ#507390) groupd
can erroneously assign the fence domain id 00000000. This can result in LVM commands becoming permanently locked. To alert system administrators to this issue, a check has been added to SOS that examines the output ofgroup_tool -v
for a string of zeroes against fence and, if it finds this to be the case, it generates a warning message that instructs administrators on how to remedy the problem. (BZ#499468)- SOS was inadvertently copying all subdirectories and files relative to where the command was executed, (including paths created by symlinks), into
/tmp
. (This problem did not occur if absolute link entries were used.) A change has been made so that SOS no longer traverses directories relative to the current working directory. As a result, this potentially large amount of data is no longer copied erroneously. (BZ#530385) - SOS had no capability to detect or report problems with cman services, which can occur when groupd becomes stuck in a state that needs to be resolved before cluster operations can continue. To rectify this, SOS now checks the output of
group_tool -v
to detect if CMAN services are set to anything other than none. A warning is then produced to prompt the system administrator to investigate the cause of the potential problem. (BZ#499472) - SOS's progress reporting was inaccurate, due to problems with output buffering and the wrong placement of error messages. When the
sosreport
command was run from a terminal, the percentage completed figure would go up and down. Furthermore, after it has reached 100%, the real time and estimated finish time would continue to grow together for several more seconds. A new, more reliable progress indication system has been added. As a result, the progress indication will be reliable from now on. (BZ#502442) - SOS would erroneously report that one or more nfs export do not have a fsid attribute set even if the fsid had been specified in the fs resource. This was due to an omission in the
cluster.py
file which was only searching the services tag and not the resources tag, in whichfsid
is set (as part of best practice) if the file system is a share resource.cluster.py
has now been patched to account for all scenarios so false reports of missingfsids
will no longer be generated. (BZ#507674) - The
sosreport -k general.syslogsize=15
command did not limit log file sizes to 15 Mb, contrary to expected behavior. This was because the limits were being erroneously applied to/var/log/messages.*
instead of/var/log/messages
. As a result, huge reports were generated and sosreport could even potentially die if all space in/tmp
was used by the process. To fix this problem, the limits are now being applied to/var/log/messages
meaning the huge reports are no longer being generated. (BZ#516551) - The list of installed RPMs generated by sosreport was in a non-standard format. Rather than in the accepted format of
name-[epoch:]version-release.arch
, it was in the form ofname-version-release-arch
. This was inconvenient to users wishing to paste output to programs such asyum
. To fix this issue, changes have been made to ensure that the list of installed RPMs is now in thename-[epoch:]version-release.arch
format to make it usable withyum
andrpm
commands. (BZ#482755) - A problem occurred when
sosreport
deliberately obscured fencing passwords in/etc/cluster/cluster.conf
. It would break the XML formatting by removing the quotation marks that surrounded the masked version of the password. A further problem was that the passwords in backup files (such as/etc/cluster/cluster.conf.1
) were not obscured. To resolve this issue, changes have been made to password masking to ensure the XML remains well-formed and the process is applied to any back-up configuration files that may exist. As a result, security is enhanced and files no longer need manual rectification before tests can be run oncluster.conf
. (BZ#497588) - SOS reports were including all of the contents of the
/tftpboot
directory, which resulted in huge files (potentially greater than 1 GB), if multiple boot media had been created in that location. To address this issue the contents of/tftpboot
are now excluded from the report. (BZ#523263) - previously, the
sar
plugin included in SOS ignored the locale setting and created sar files with time data presented in the default format. With this update, the plugin now honors the locale setting and generates sar files with time data in the expected format (ie the same format as sar files created bysysstat cron
jobs). (BZ#525010)
- SOS was only gathering limited data on some aspects of system performance. This has been expanded to include sources such as:
/var/log/cron*
parted hard disk device print
tune2fs -l filesystem
/etc/inittab
service service name status
/etc/inittab
/etc/kdump.conf
/sbin/mdadm -D /dev/md*
/etc/lvm
/proc/buddyinfo
- SOS was not gathering dmraid information, which can be extremely useful for troubleshooting. A large amount of functionality has now been added that enables SOS to report dm-raid signatures if these are detected and send this information to support engineers. It also ensures that this information is reported even if SOS is being run in rescue mood at one of the service levels at which dm-raid systems will not boot. This information is gathered via these specific commands:
dmraid -V
dmraid -b
dmraid -r
dmraid -s
dmraid -tay
dmraid -rD
the output of which is now gathered by sosreport resulting in much quicker identification and resolution of support issues. (BZ#507672) - SOS was not reporting configuration information for running OpenAIS systems, making troubleshooting extremely difficult, especially if the systems in question had been heavily customized. SOS has been modified so that the following detailed OpenAIS cluster information is now captured, leading to faster and more accurate troubleshooting. (BZ#521344)
- the rh-upload-core script included with sos has been improved. Most significantly, the script can now upload any file, and not just vmcores. (BZ#523750)
- SOS reports include the name service cache daemon (nscd) configuration file found at /etc/nscd.conf. With this update, SOS reports now also include the debug logs if nscd is running with debugging enabled. Note: similarly to syslog, SOS limits the debug log files to 50 MiB. If nscd runs in debug mode for extended periods (eg a week), the debug log files can be 100s of MiBs and larger. (BZ#536960)
- SOS now checks for the presence of Apache QPID (specifically, it checkes for qpidd, the QPID daemon) and, if detected, collects QPID configuration, state and log information and saves it to sos_command/broker/ in the sosreport. (BZ#557851)
- SOS did not gather the SELinux audit log files (/var/log/audit/*). It has now been amended to gather the last fifty entries of this log, meaning that SELinux problems can be investigated more easily. (BZ#443984)
- SOS did not gather sound card information, leading to prolonged support calls whilst this information was manually collected and sent in by users. A new Python-based plug-in has been written that collects information about soundcards via ALSA. As a result, users will not have to hunt for this information themselves during a support call. (BZ#478009)
- SOS was not reporting the output of the lsb_release command. If /etc/redhat-release was corrupted or missing, it was impossible for support to confirm which version of Red Hat Enterprise Linux was in use. lsb)release provides a useful fallback. A plug-in has been added to SOS to gather a large amount of data provided by the lsb_release command and in associated /etc files. It also outputs a message, informing the user if /etc/redhat-release is missing. By reporting this information, the system version can be identified quickly and accurately, assisting in the troubleshooting process. (BZ#479111)
- SOS was not able to collect data relating to the Quagga routing suite if this was installed on a user's systems. As a result, troubleshooting these systems was more difficult. To remedy this problem, a plug-in has been added to SOS that collects Quagga-related configuration files, thereby providing the requisite information to quckly and easily identify and remedy problems with these systems. (BZ#485191)
- SOS was not able to collect Cron data. As a result, troubleshooting was sometimes difficult. To resolve this issue, SOS has been patched so that it now includes data from /var/spool/cron allowing support engineers to know about the scheduling of tasks on a given system. (BZ#485559)
- SOS was not able to report information about the Cobbler Linux installation suite, limiting engineers' attempts to troubleshoot systems. To rectify this, SOS now has a plug-in that gathers the following files if they are present:
/etc/cobbler
/var/log/cobbler
/var/lib/rhn/kickstarts
/var/lib/cobbler/snippits
/var/lib/cobbler/config
/var/lib/cobbler/kickstarts
/var/lib/cobbler/triggers
As a result, troubleshooting Cobbler is now much easier. (BZ#495934). - SOS was not able to report information about the iSCSI Initiator if this was present on the system. Thus, information about transmission of SCSI commands over IP networks could not be gathered by the reporting tool. To rectify this problem, SOS now reports on:
/etc/iscsi/iscsid.conf
/etc/iscsi/initiatorname.iscsi
/var/lib/iscsi/
This makes debugging problems involving iSCSI much easier. (BZ#512889) - SOS was not able to capture multicast information This made it hard to debug OpenAIS clusters, as they use multicast IGMP groups to send messages. For the purposes of troubleshooting, it is important to know which groups are available and active on a node, so SOS has been enhanced so that it can now report on the following information:
netstat -agn
ip mroute show
ip maddr show
As a result, OpenAIS clusters can now be debugged by troubleshooters much more easily. (BZ#514294)