このコンテンツは選択した言語では利用できません。

Chapter 1. Troubleshooting Red Hat Edge Manager

When working with devices in Red Hat Edge Manager, troubleshooting begins with interpreting the structured status messages provided by the device. By identifying the specific phase and component where a failure occurred, you can quickly determine whether an issue is caused by local resource constraints, network connectivity, or configuration errors.

1.1. Troubleshooting device error codes
リンクのコピー

To improve security and performance, Red Hat Edge Manager uses structured error codes in device status responses. These codes replace verbose system logs with categorized, actionable summaries, ensuring sensitive data (like credentials) is never exposed in the API or UI.

1.1.1. Error message anatomy
リンクのコピー

Every error message follows a standardized 250-character format to help you quickly pinpoint the phase, component, and specific cause of a failure.

The error message format is as follows:

[timestamp] While <Phase>, <Component> failed [for "<Element>"]: <Category> issue - <STATUS_CODE>

Expand

Field	Description	Examples
Phase	The stage of the operation where the error occurred.	`Preparing`, `ApplyingUpdate`, `Rebooting`, `RollingBack`
Component	The specific system area affected.	`os`, `config`, `applications`, `systemd`
Element	The specific resource (file, service, or image).	`/etc/app.conf`, `fleet-agent.service`, `quay.io/app`
Category	The functional area of the failure.	`Network`, `Security`, `Resource`
Status Code	The standardized gRPC-based error code.	`UNAVAILABLE`, `PERMISSION_DENIED`, `INTERNAL`

1.1.1.1. Error reference & resolution
リンクのコピー

Use the table below to identify the root cause of a status code and the recommended next steps for resolution.

Expand

Category	Status Code	Common Causes	Recommended Action
Network	`UNAVAILABLE` / `DEADLINE_EXCEEDED`	DNS failure, registry unreachable, or connection timeout. Image non-existent or inaccessible due to registry permissions.	Check device internet connectivity and firewall rules for registry access. Verify the image name/tag and registry-level access permissions.
Security	`PERMISSION_DENIED` / `UNAUTHENTICATED`	Invalid credentials, expired tokens, or insufficient permissions.	Verify registry credentials and ensure the device identity is valid.
Configuration	`INVALID_ARGUMENT` / `FAILED_PRECONDITION`	Syntax errors in YAML/JSON or missing mandatory fields. Invalid element, token, or path format.	Validate your configuration spec against the schema.
Filesystem	`NOT_FOUND` / `ALREADY_EXISTS`	Missing files, directory conflicts, or path errors.	Verify the existence of required local resources or mount points.
Resource	`RESOURCE_EXHAUSTED`	Disk full, Out of Memory (OOM), or CPU throttling.	Check device telemetry for disk usage and memory pressure.
System	`INTERNAL` / `UNKNOWN`	Unexpected system faults or unclassified errors.	See Deep Dive Debugging below to correlate with journal logs.

1.1.1.2. Rollback and failed OS updates
リンクのコピー

If an OS update fails, the device automatically rolls back to the previous version. The phase may appear as RollingBack; when rollback completes, the update condition reason is Error. The device does not retry the failed version automatically. For how to recognize a rollback and what to do next, see Troubleshooting OS update rollback.

1.1.1.3. Deep dive debugging
リンクのコピー

While API status responses are sanitized for security, full error details—including stack traces and raw Go error chains—are preserved in the local device journal.

Procedure

If you encounter an UNKNOWN or INTERNAL error, or if the status message is truncated, you can map the status code to the detailed log:

Retrieve the Device Status, making sure to note the timestamp and component from the message field.
```
flightctl get device/<device-name> -o yaml
```
Access the device logs: Search the local journal for the corresponding error context to see the unredacted failure:
```
journalctl -u fleet-agent | grep "failed to reload systemd daemon"
```

API responses are limited to 250 characters. For the full diagnostic context—including raw Go error strings and detailed stack traces—refer to the local logs on the device.

1.2. Troubleshooting OS update rollback
リンクのコピー

Recognize when a device has rolled back after a failed OS update and what to do next.

When an OS update fails, Red Hat Edge Manager uses greenboot to automatically roll back the device to the previous working OS version. This section helps you recognize when a rollback occurred and what to do next.

1.2.1. Recognizing a rollback or failed update
リンクのコピー

Check the device status to see whether an update failed and the device rolled back:

Retrieve the device status:

flightctl get device/<device_name> -o yaml

In the output, check:
- status.updated.status: After a rollback, the device is typically OutOfDate (the device is running the previous OS version, not the version that was requested).
- status.conditions: Look for the Updating condition. If the condition’s reason is Error, the update failed and the device has rolled back to the pre-update OS and configuration. If the reason was RollingBack, the agent was in the process of rolling back when it last reported.

The status.updated.info field may contain a short message about the last state transition.

1.2.2. Viewing greenboot and rollback logs
リンクのコピー

When troubleshooting a rollback, the most useful logs are from greenboot itself. On the device, use these commands to view them:

To view health check output (greenboot health check results), run:

sudo journalctl -o cat -u greenboot-healthcheck.service

The following example shows journal output typical of a failed greenboot health check. Use it to pattern-match what you see on a device:

Running Required Health Check Scripts...
[20_check_flightctl_agent.sh] INFO: === flightctl-agent greenboot health check started ===
[20_check_flightctl_agent.sh] INFO: GRUB boot variables:
boot_success=0
boot_counter=2
...
time="..." level=error msg="health: Service check failed: service is not enabled (state: disabled)"
[20_check_flightctl_agent.sh] ERROR: flightctl-agent health check failed

To view pre-rollback diagnostic output (scripts that run before rollback), run:
```
sudo journalctl -o cat -u redboot-task-runner.service
```
To quickly check whether the last boot was declared successful by greenboot, inspect the GRUB environment on the device:
```
sudo grub2-editenv - list | grep ^boot_success
```
A value of boot_success=1 means greenboot declared the boot healthy. A value of 0 means either health checks are still running or the boot was declared failed.

1.2.3. Enabling persistent journal storage
リンクのコピー

By default, the systemd journal service stores data in the volatile /run/log/journal directory, which does not persist across reboots. To retain greenboot and agent logs for post-rollback analysis, enable persistent storage.

Procedure

Create the journal configuration directory:

sudo mkdir -p /etc/systemd/journald.conf.d

Create the configuration file:

cat <<EOF | sudo tee /etc/systemd/journald.conf.d/flightctl.conf &>/dev/null
[Journal]
Storage=persistent
SystemMaxUse=1G
RuntimeMaxUse=1G
EOF

Edit the configuration file values for your size requirements. For example, adjust SystemMaxUse and RuntimeMaxUse in /etc/systemd/journald.conf.d/flightctl.conf.
Restart the journal service to apply the configuration:

sudo systemctl restart systemd-journald

1.2.4. Post-rollback recovery and diagnostics
リンクのコピー

Verify the device is running: The device should be online and running the previous OS version. Confirm that status.summary.status is Online or Degraded and that status.os.image matches the previous (working) image.
Investigate the failure: Use the device status message and, if you have access, the device logs. Prefer the greenboot journal output (see Viewing greenboot and rollback logs); you can also check the agent journal (for example, journalctl -u flightctl-agent.service) to determine why the update failed. Common causes include health check failures after reboot, network or registry issues, or resource constraints. See Troubleshooting device error codes for error categories and recommended actions.
Fix and try a new version: Address the underlying issue (for example, fix the OS image or configuration, or resolve network or resource problems). When ready, update the device spec to a new OS image version or a corrected image so the agent can attempt an update again.
Note
The agent does not retry a failed version. It marks the failed version and skips it in future reconciliation. Pushing the same OS image again without change will not trigger a retry; you must push a new image version (different digest).

1.2.5. When to escalate
リンクのコピー

Consider escalating or opening a support case if:

The device does not come back online after a rollback.
Rollbacks happen repeatedly for the same or different OS versions.
The device status remains in RollingBack or Error for an extended period with no recovery.
You need to force a retry of a previously failed version and the product does not provide a supported way to do so.

1.3. Generating a device log bundle
リンクのコピー

Use the integrated flightctl-must-gather script directly on the device to generate a comprehensive bundle of diagnostic logs. This log bundle, in a standard .tar format, provides the necessary data to debug the device agent and assists in efficient troubleshooting and bug reporting.

Run the following command on the device and include the .tar file in the bug report.
This depends on an SSH connection to extract the .tar file.
```
sudo flightctl-must-gather
```

1.4. Viewing a device’s effective target configuration
リンクのコピー

The device manifest returned by the flightctl get device command still only has references to external configuration and secret objects. Only when the device agent queries the service, the service replaces the references with the actual configuration and secret data.

While this better protects potentially sensitive data, it also makes troubleshooting faulty configurations hard. This is why a user can be authorized to query the effective configuration as rendered by the service to the agent.

Procedure

To query the effective configuration, use the following command:
```
flightctl get device/${device_name} --rendered | jq
```

このコンテンツは選択した言語では利用できません。

Chapter 1. Troubleshooting Red Hat Edge Manager

1.1. Troubleshooting device error codes
リンクのコピー

1.1.1. Error message anatomy
リンクのコピー

1.1.1.1. Error reference & resolution
リンクのコピー

1.1.1.2. Rollback and failed OS updates
リンクのコピー

1.1.1.3. Deep dive debugging
リンクのコピー

1.2. Troubleshooting OS update rollback
リンクのコピー

1.2.1. Recognizing a rollback or failed update
リンクのコピー

1.2.2. Viewing greenboot and rollback logs
リンクのコピー

1.2.3. Enabling persistent journal storage
リンクのコピー

1.2.4. Post-rollback recovery and diagnostics
リンクのコピー

1.2.5. When to escalate
リンクのコピー

1.3. Generating a device log bundle
リンクのコピー

1.4. Viewing a device’s effective target configuration
リンクのコピー

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

このコンテンツは選択した言語では利用できません。

Chapter 1. Troubleshooting Red Hat Edge Manager

1.1. Troubleshooting device error codesリンクのコピーリンクがクリップボードにコピーされました!

1.1.1. Error message anatomyリンクのコピーリンクがクリップボードにコピーされました!

1.1.1.1. Error reference & resolutionリンクのコピーリンクがクリップボードにコピーされました!

1.1.1.2. Rollback and failed OS updatesリンクのコピーリンクがクリップボードにコピーされました!

1.1.1.3. Deep dive debuggingリンクのコピーリンクがクリップボードにコピーされました!

1.2. Troubleshooting OS update rollbackリンクのコピーリンクがクリップボードにコピーされました!

1.2.1. Recognizing a rollback or failed updateリンクのコピーリンクがクリップボードにコピーされました!

1.2.2. Viewing greenboot and rollback logsリンクのコピーリンクがクリップボードにコピーされました!

1.2.3. Enabling persistent journal storageリンクのコピーリンクがクリップボードにコピーされました!

1.2.4. Post-rollback recovery and diagnosticsリンクのコピーリンクがクリップボードにコピーされました!

1.2.5. When to escalateリンクのコピーリンクがクリップボードにコピーされました!

1.3. Generating a device log bundleリンクのコピーリンクがクリップボードにコピーされました!

1.4. Viewing a device’s effective target configurationリンクのコピーリンクがクリップボードにコピーされました!

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Troubleshooting device error codes
リンクのコピー

1.1.1. Error message anatomy
リンクのコピー

1.1.1.1. Error reference & resolution
リンクのコピー

1.1.1.2. Rollback and failed OS updates
リンクのコピー

1.1.1.3. Deep dive debugging
リンクのコピー

1.2. Troubleshooting OS update rollback
リンクのコピー

1.2.1. Recognizing a rollback or failed update
リンクのコピー

1.2.2. Viewing greenboot and rollback logs
リンクのコピー

1.2.3. Enabling persistent journal storage
リンクのコピー

1.2.4. Post-rollback recovery and diagnostics
リンクのコピー

1.2.5. When to escalate
リンクのコピー

1.3. Generating a device log bundle
リンクのコピー

1.4. Viewing a device’s effective target configuration
リンクのコピー