Chapter 32. Using Advanced Error Reporting


When you use the Advanced Error Reporting (AER), you receive notifications of error events for Peripheral Component Interconnect Express (PCIe) devices. RHEL enables this kernel feature by default and collects the reported errors in the kernel logs. If you use the rasdaemon program, these errors are parsed and stored in its database.

32.1. Overview of AER

Advanced Error Reporting (AER) is a kernel feature that provides enhanced error reporting for Peripheral Component Interconnect Express (PCIe) devices. The AER kernel driver attaches root ports which support PCIe AER capability in order to:

  • Gather the comprehensive error information
  • Report errors to the users
  • Perform error recovery actions

When AER captures an error, it sends an error message to the console. For a repairable error, the console output is a warning.

Example 32.1. Example AER output

Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=000000c0/00002000
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 6] Bad TLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 7] Bad DLLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=00000040/00002000

32.2. Collecting and displaying AER messages

To collect and display AER messages, use the rasdaemon program.

Procedure

  1. Install the rasdaemon package.

    # yum install rasdaemon
  2. Enable and start the rasdaemon service.

    # systemctl enable --now rasdaemon
    Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service  /usr/lib/systemd/system/rasdaemon.service.
  3. Issue the ras-mc-ctl command.

    # ras-mc-ctl --summary
    # ras-mc-ctl --errors

    The command displays a summary of the logged errors (the --summary option) or displays the errors stored in the error database (the --errors option).

Additional resources

  • The rasdaemon(8) manual page
  • The ras-mc-ctl(8) manual page
Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.