Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 1. Troubleshooting Red Hat Edge Manager

When working with devices in Red Hat Edge Manager, troubleshooting begins with interpreting the structured status messages provided by the device. By identifying the specific phase and component where a failure occurred, you can quickly determine whether an issue is caused by local resource constraints, network connectivity, or configuration errors.

1.1. Troubleshooting device error codes
Link kopieren

To improve security and performance, Red Hat Edge Manager uses structured error codes in device status responses. These codes replace verbose system logs with categorized, actionable summaries, ensuring sensitive data (like credentials) is never exposed in the API or UI.

1.1.1. Error message anatomy
Link kopieren

Every error message follows a standardized 250-character format to help you quickly pinpoint the phase, component, and specific cause of a failure.

The error message format is as follows:

[timestamp] While <Phase>, <Component> failed [for "<Element>"]: <Category> issue - <STATUS_CODE>

Expand

Field	Description	Examples
Phase	The stage of the operation where the error occurred.	`Preparing`, `ApplyingUpdate`, `Rebooting`, `RollingBack`
Component	The specific system area affected.	`os`, `config`, `applications`, `systemd`
Element	The specific resource (file, service, or image).	`/etc/app.conf`, `fleet-agent.service`, `quay.io/app`
Category	The functional area of the failure.	`Network`, `Security`, `Resource`
Status Code	The standardized gRPC-based error code.	`UNAVAILABLE`, `PERMISSION_DENIED`, `INTERNAL`

1.1.1.1. Error reference & resolution
Link kopieren

Use the table below to identify the root cause of a status code and the recommended next steps for resolution.

Expand

Category	Status Code	Common Causes	Recommended Action
Network	`UNAVAILABLE` / `DEADLINE_EXCEEDED`	DNS failure, registry unreachable, or connection timeout. Image non-existent or inaccessible due to registry permissions.	Check device internet connectivity and firewall rules for registry access. Verify the image name/tag and registry-level access permissions.
Security	`PERMISSION_DENIED` / `UNAUTHENTICATED`	Invalid credentials, expired tokens, or insufficient permissions.	Verify registry credentials and ensure the device identity is valid.
Configuration	`INVALID_ARGUMENT` / `FAILED_PRECONDITION`	Syntax errors in YAML/JSON or missing mandatory fields. Invalid element, token, or path format.	Validate your configuration spec against the schema.
Filesystem	`NOT_FOUND` / `ALREADY_EXISTS`	Missing files, directory conflicts, or path errors.	Verify the existence of required local resources or mount points.
Resource	`RESOURCE_EXHAUSTED`	Disk full, Out of Memory (OOM), or CPU throttling.	Check device telemetry for disk usage and memory pressure.
System	`INTERNAL` / `UNKNOWN`	Unexpected system faults or unclassified errors.	See Deep Dive Debugging below to correlate with journal logs.

1.1.1.2. Rollback and failed OS updates
Link kopieren

If an OS update fails, the device automatically rolls back to the previous version. The phase may appear as RollingBack; when rollback completes, the update condition reason is Error. The device does not retry the failed version automatically. For how to recognize a rollback and what to do next, see Troubleshooting OS update rollback.

1.1.1.3. Deep dive debugging
Link kopieren

While API status responses are sanitized for security, full error details—including stack traces and raw Go error chains—are preserved in the local device journal.

Procedure

If you encounter an UNKNOWN or INTERNAL error, or if the status message is truncated, you can map the status code to the detailed log:

Retrieve the Device Status, making sure to note the timestamp and component from the message field.
```
flightctl get device/<device-name> -o yaml
```
Access the device logs: Search the local journal for the corresponding error context to see the unredacted failure:
```
journalctl -u fleet-agent | grep "failed to reload systemd daemon"
```

API responses are limited to 250 characters. For the full diagnostic context—including raw Go error strings and detailed stack traces—refer to the local logs on the device.

1.2. Troubleshooting OS update rollback
Link kopieren

Recognize when a device has rolled back after a failed OS update and what to do next.

When an OS update fails, Red Hat Edge Manager uses greenboot to automatically roll back the device to the previous working OS version. This section helps you recognize when a rollback occurred and what to do next.

1.2.1. Recognizing a rollback or failed update
Link kopieren

Check the device status to see whether an update failed and the device rolled back:

Retrieve the device status:

flightctl get device/<device_name> -o yaml

In the output, check:
- status.updated.status: After a rollback, the device is typically OutOfDate (the device is running the previous OS version, not the version that was requested).
- status.conditions: Look for the Updating condition. If the condition’s reason is Error, the update failed and the device has rolled back to the pre-update OS and configuration. If the reason was RollingBack, the agent was in the process of rolling back when it last reported.

The status.updated.info field may contain a short message about the last state transition.

1.2.2. Viewing greenboot and rollback logs
Link kopieren

When troubleshooting a rollback, the most useful logs are from greenboot itself. On the device, use these commands to view them:

To view health check output (greenboot health check results), run:

sudo journalctl -o cat -u greenboot-healthcheck.service

The following example shows journal output typical of a failed greenboot health check. Use it to pattern-match what you see on a device:

Running Required Health Check Scripts...
[20_check_flightctl_agent.sh] INFO: === flightctl-agent greenboot health check started ===
[20_check_flightctl_agent.sh] INFO: GRUB boot variables:
boot_success=0
boot_counter=2
...
time="..." level=error msg="health: Service check failed: service is not enabled (state: disabled)"
[20_check_flightctl_agent.sh] ERROR: flightctl-agent health check failed

To view pre-rollback diagnostic output (scripts that run before rollback), run:
```
sudo journalctl -o cat -u redboot-task-runner.service
```
To quickly check whether the last boot was declared successful by greenboot, inspect the GRUB environment on the device:
```
sudo grub2-editenv - list | grep ^boot_success
```
A value of boot_success=1 means greenboot declared the boot healthy. A value of 0 means either health checks are still running or the boot was declared failed.

1.2.3. Enabling persistent journal storage
Link kopieren

By default, the systemd journal service stores data in the volatile /run/log/journal directory, which does not persist across reboots. To retain greenboot and agent logs for post-rollback analysis, enable persistent storage.

Procedure

Create the journal configuration directory:

sudo mkdir -p /etc/systemd/journald.conf.d

Create the configuration file:

cat <<EOF | sudo tee /etc/systemd/journald.conf.d/flightctl.conf &>/dev/null
[Journal]
Storage=persistent
SystemMaxUse=1G
RuntimeMaxUse=1G
EOF

Edit the configuration file values for your size requirements. For example, adjust SystemMaxUse and RuntimeMaxUse in /etc/systemd/journald.conf.d/flightctl.conf.
Restart the journal service to apply the configuration:

sudo systemctl restart systemd-journald

1.2.4. Post-rollback recovery and diagnostics
Link kopieren

Verify the device is running: The device should be online and running the previous OS version. Confirm that status.summary.status is Online or Degraded and that status.os.image matches the previous (working) image.
Investigate the failure: Use the device status message and, if you have access, the device logs. Prefer the greenboot journal output (see Viewing greenboot and rollback logs); you can also check the agent journal (for example, journalctl -u flightctl-agent.service) to determine why the update failed. Common causes include health check failures after reboot, network or registry issues, or resource constraints. See Troubleshooting device error codes for error categories and recommended actions.
Fix and try a new version: Address the underlying issue (for example, fix the OS image or configuration, or resolve network or resource problems). When ready, update the device spec to a new OS image version or a corrected image so the agent can attempt an update again.
Note
The agent does not retry a failed version. It marks the failed version and skips it in future reconciliation. Pushing the same OS image again without change will not trigger a retry; you must push a new image version (different digest).

1.2.5. When to escalate
Link kopieren

Consider escalating or opening a support case if:

The device does not come back online after a rollback.
Rollbacks happen repeatedly for the same or different OS versions.
The device status remains in RollingBack or Error for an extended period with no recovery.
You need to force a retry of a previously failed version and the product does not provide a supported way to do so.

1.3. Generating a device log bundle
Link kopieren

Use the integrated flightctl-must-gather script directly on the device to generate a comprehensive bundle of diagnostic logs. This log bundle, in a standard .tar format, provides the necessary data to debug the device agent and assists in efficient troubleshooting and bug reporting.

Run the following command on the device and include the .tar file in the bug report.
This depends on an SSH connection to extract the .tar file.
```
sudo flightctl-must-gather
```

1.4. Viewing a device’s effective target configuration
Link kopieren

The device manifest returned by the flightctl get device command still only has references to external configuration and secret objects. Only when the device agent queries the service, the service replaces the references with the actual configuration and secret data.

While this better protects potentially sensitive data, it also makes troubleshooting faulty configurations hard. This is why a user can be authorized to query the effective configuration as rendered by the service to the agent.

Procedure

To query the effective configuration, use the following command:
```
flightctl get device/${device_name} --rendered | jq
```

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 1. Troubleshooting Red Hat Edge Manager

1.1. Troubleshooting device error codes
Link kopieren

1.1.1. Error message anatomy
Link kopieren

1.1.1.1. Error reference & resolution
Link kopieren

1.1.1.2. Rollback and failed OS updates
Link kopieren

1.1.1.3. Deep dive debugging
Link kopieren

1.2. Troubleshooting OS update rollback
Link kopieren

1.2.1. Recognizing a rollback or failed update
Link kopieren

1.2.2. Viewing greenboot and rollback logs
Link kopieren

1.2.3. Enabling persistent journal storage
Link kopieren

1.2.4. Post-rollback recovery and diagnostics
Link kopieren

1.2.5. When to escalate
Link kopieren

1.3. Generating a device log bundle
Link kopieren

1.4. Viewing a device’s effective target configuration
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat

Mehr Inklusion in Open Source

Über Red Hat Dokumentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 1. Troubleshooting Red Hat Edge Manager

1.1. Troubleshooting device error codesLink kopierenLink in die Zwischenablage kopiert!

1.1.1. Error message anatomyLink kopierenLink in die Zwischenablage kopiert!

1.1.1.1. Error reference & resolutionLink kopierenLink in die Zwischenablage kopiert!

1.1.1.2. Rollback and failed OS updatesLink kopierenLink in die Zwischenablage kopiert!

1.1.1.3. Deep dive debuggingLink kopierenLink in die Zwischenablage kopiert!

1.2. Troubleshooting OS update rollbackLink kopierenLink in die Zwischenablage kopiert!

1.2.1. Recognizing a rollback or failed updateLink kopierenLink in die Zwischenablage kopiert!

1.2.2. Viewing greenboot and rollback logsLink kopierenLink in die Zwischenablage kopiert!

1.2.3. Enabling persistent journal storageLink kopierenLink in die Zwischenablage kopiert!

1.2.4. Post-rollback recovery and diagnosticsLink kopierenLink in die Zwischenablage kopiert!

1.2.5. When to escalateLink kopierenLink in die Zwischenablage kopiert!

1.3. Generating a device log bundleLink kopierenLink in die Zwischenablage kopiert!

1.4. Viewing a device’s effective target configurationLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat

Mehr Inklusion in Open Source

Über Red Hat Dokumentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Troubleshooting device error codes
Link kopieren

1.1.1. Error message anatomy
Link kopieren

1.1.1.1. Error reference & resolution
Link kopieren

1.1.1.2. Rollback and failed OS updates
Link kopieren

1.1.1.3. Deep dive debugging
Link kopieren

1.2. Troubleshooting OS update rollback
Link kopieren

1.2.1. Recognizing a rollback or failed update
Link kopieren

1.2.2. Viewing greenboot and rollback logs
Link kopieren

1.2.3. Enabling persistent journal storage
Link kopieren

1.2.4. Post-rollback recovery and diagnostics
Link kopieren

1.2.5. When to escalate
Link kopieren

1.3. Generating a device log bundle
Link kopieren

1.4. Viewing a device’s effective target configuration
Link kopieren