Search

Chapter 31. Using Advanced Error Reporting

download PDF

When you use the Advanced Error Reporting (AER), you receive notifications of error events for Peripheral Component Interconnect Express (PCIe) devices. RHEL enables this kernel feature by default and collects the reported errors in the kernel logs. Moreover, if you use the rasdaemon program, these errors are parsed and stored in its database.

31.1. Overview of AER

Advanced Error Reporting (AER) is a kernel feature that provides enhanced error reporting for Peripheral Component Interconnect Express (PCIe) devices. The AER kernel driver attaches root ports which support PCIe AER capability in order to:

  • Gather the comprehensive error information
  • Report errors to the users
  • Perform error recovery actions

When AER captures an error, it sends an error message to the console. For a repairable error, the console output is a warning.

Example 31.1. Example AER output

Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=000000c0/00002000
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 6] Bad TLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 7] Bad DLLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=00000040/00002000

31.2. Collecting and displaying AER messages

In order to collect and display AER messages, use the rasdaemon program.

Procedure

  1. Install the rasdaemon package.

    # yum install rasdaemon
  2. Enable and start the rasdaemon service.

    # systemctl enable --now rasdaemon
    Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service  /usr/lib/systemd/system/rasdaemon.service.
  3. Issue the ras-mc-ctl command.

    # ras-mc-ctl --summary
    # ras-mc-ctl --errors

    The command displays a summary of the logged errors (the --summary option) or displays the errors stored in the error database (the --errors option).

Additional resources

  • The rasdaemon(8) manual page
  • The ras-mc-ctl(8) manual page
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.