Chapter 31. Using Advanced Error Reporting
When you use the Advanced Error Reporting
(AER
), you receive notifications of error events for Peripheral Component Interconnect Express
(PCIe
) devices. RHEL enables this kernel feature by default and collects the reported errors in the kernel logs. Moreover, if you use the rasdaemon
program, these errors are parsed and stored in its database.
31.1. Overview of AER
Advanced Error Reporting
(AER
) is a kernel feature that provides enhanced error reporting for Peripheral Component Interconnect Express
(PCIe
) devices. The AER
kernel driver attaches root ports which support PCIe
AER
capability in order to:
- Gather the comprehensive error information
- Report errors to the users
- Perform error recovery actions
When AER
captures an error, it sends an error message to the console. For a repairable error, the console output is a warning.
Example 31.1. Example AER output
Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID) Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: device [8086:2030] error status/mask=000000c0/00002000 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: [ 6] Bad TLP Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: [ 7] Bad DLLP Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID) Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: device [8086:2030] error status/mask=00000040/00002000
31.2. Collecting and displaying AER messages
In order to collect and display AER messages, use the rasdaemon
program.
Procedure
Install the
rasdaemon
package.# yum install rasdaemon
Enable and start the
rasdaemon
service.# systemctl enable --now rasdaemon Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service
/usr/lib/systemd/system/rasdaemon.service. Issue the
ras-mc-ctl
command.# ras-mc-ctl --summary # ras-mc-ctl --errors
The command displays a summary of the logged errors (the
--summary
option) or displays the errors stored in the error database (the--errors
option).
Additional resources
-
The
rasdaemon(8)
manual page -
The
ras-mc-ctl(8)
manual page