Chapter 1. Troubleshooting Red Hat Edge Manager


When working with devices in Red Hat Edge Manager, troubleshooting begins with interpreting the structured status messages provided by the device. By identifying the specific phase and component where a failure occurred, you can quickly determine whether an issue is caused by local resource constraints, network connectivity, or configuration errors.

1.1. Troubleshooting device error codes

To improve security and performance, Red Hat Edge Manager uses structured error codes in device status responses. These codes replace verbose system logs with categorized, actionable summaries, ensuring sensitive data (like credentials) is never exposed in the API or UI.

1.1.1. Error message anatomy

Every error message follows a standardized 250-character format to help you quickly pinpoint the phase, component, and specific cause of a failure.

The error message format is as follows:

[timestamp] While <Phase>, <Component> failed [for "<Element>"]: <Category> issue - <STATUS_CODE>
Copy to Clipboard Toggle word wrap
Expand
FieldDescriptionExamples

Phase

The stage of the operation where the error occurred.

Preparing, ApplyingUpdate, Rebooting

Component

The specific system area affected.

os, config, applications, systemd

Element

The specific resource (file, service, or image).

/etc/app.conf, fleet-agent.service, quay.io/app

Category

The functional area of the failure.

Network, Security, Resource

Status Code

The standardized gRPC-based error code.

UNAVAILABLE, PERMISSION_DENIED, INTERNAL

1.1.1.1. Error reference & resolution

Use the table below to identify the root cause of a status code and the recommended next steps for resolution.

Expand
CategoryStatus CodeCommon CausesRecommended Action

Network

UNAVAILABLE / DEADLINE_EXCEEDED

DNS failure, registry unreachable, or connection timeout. Image non-existent or inaccessible due to registry permissions.

Check device internet connectivity and firewall rules for registry access. Verify the image name/tag and registry-level access permissions.

Security

PERMISSION_DENIED / UNAUTHENTICATED

Invalid credentials, expired tokens, or insufficient permissions.

Verify registry credentials and ensure the device identity is valid.

Configuration

INVALID_ARGUMENT / FAILED_PRECONDITION

Syntax errors in YAML/JSON or missing mandatory fields. Invalid element, token, or path format.

Validate your configuration spec against the schema.

Filesystem

NOT_FOUND / ALREADY_EXISTS

Missing files, directory conflicts, or path errors.

Verify the existence of required local resources or mount points.

Resource

RESOURCE_EXHAUSTED

Disk full, Out of Memory (OOM), or CPU throttling.

Check device telemetry for disk usage and memory pressure.

System

INTERNAL / UNKNOWN

Unexpected system faults or unclassified errors.

See Deep Dive Debugging below to correlate with journal logs.

1.1.1.2. Deep dive debugging

While API status responses are sanitized for security, full error details—including stack traces and raw Go error chains—are preserved in the local device journal.

Procedure

If you encounter an UNKNOWN or INTERNAL error, or if the status message is truncated, you can map the status code to the detailed log:

  1. Retrieve the Device Status, making sure to note the timestamp and component from the message field.

    flightctl get device/<device-name> -o yaml
    Copy to Clipboard Toggle word wrap
  2. Access the device logs: Search the local journal for the corresponding error context to see the unredacted failure:

    journalctl -u fleet-agent | grep "failed to reload systemd daemon"
    Copy to Clipboard Toggle word wrap

API responses are limited to 250 characters. For the full diagnostic context—including raw Go error strings and detailed stack traces—refer to the local logs on the device.

1.2. Generating a device log bundle

Use the integrated flightctl-must-gather script directly on the device to generate a comprehensive bundle of diagnostic logs. This log bundle, in a standard .tar format, provides the necessary data to debug the device agent and assists in efficient troubleshooting and bug reporting.

  • Run the following command on the device and include the .tar file in the bug report.

    This depends on an SSH connection to extract the .tar file.

    sudo flightctl-must-gather
    Copy to Clipboard Toggle word wrap

The device manifest returned by the flightctl get device command still only has references to external configuration and secret objects. Only when the device agent queries the service, the service replaces the references with the actual configuration and secret data.

While this better protects potentially sensitive data, it also makes troubleshooting faulty configurations hard. This is why a user can be authorized to query the effective configuration as rendered by the service to the agent.

Procedure

  • To query the effective configuration, use the following command:

    flightctl get device/${device_name} --rendered | jq
    Copy to Clipboard Toggle word wrap
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top