Chapter 5. The greenboot health check framework

Greenboot is the generic health check framework for the systemd service on rpm-ostree systems such as Red Hat Enterprise Linux for Edge (RHEL for Edge). This framework is included in MicroShift installations with the microshift-greenboot and greenboot-default-health-checks RPM packages.

Greenboot health checks run at various times to assess system health and automate a rollback to the last healthy state in the event of software trouble, for example:

Default health check scripts run each time the system starts.
In addition the to the default health checks, you can write, install, and configure application health check scripts to also run every time the system starts.
Greenboot can reduce your risk of being locked out of edge devices during updates and prevent a significant interruption of service if an update fails.
When a failure is detected, the system boots into the last known working configuration using the rpm-ostree rollback capability. This feature is especially useful automation for edge devices where direct serviceability is either limited or non-existent.

A MicroShift application health check script is included in the microshift-greenboot RPM. The greenboot-default-health-checks RPM includes health check scripts verifying that DNS and ostree services are accessible. You can create your own health check scripts for the workloads you are running. You can write one that verifies that an application has started, for example.

Note

Rollback is not possible in the case of an update failure on a system not using rpm-ostree. This is true even though health checks might run.

5.1. How greenboot uses directories to run scripts
Copy link

Health check scripts run from four /etc/greenboot directories. These scripts run in alphabetical order. Keep this in mind when you configure the scripts for your workloads.

When the system starts, greenboot runs the scripts in the required.d and wanted.d directories. Depending on the outcome of those scripts, greenboot continues the startup or attempts a rollback as follows:

System as expected: When all of the scripts in the required.d directory are successfully run, greenboot runs any scripts present in the /etc/greenboot/green.d directory.
System trouble: If any of the scripts in the required.d directory fail, greenboot runs any prerollback scripts present in the red.d directory, then restarts the system.

Note

Greenboot redirects script and health check output to the system log. When you are logged in, a daily message provides the overall system health output.

5.1.1. Greenboot directories details
Copy link

Returning a nonzero exit code from any script means that script has failed. Greenboot restarts the system a few times to retry the scripts before attempting to roll back to the previous version.

/etc/greenboot/check/required.d contains the health checks that must not fail.
- If the scripts fail, greenboot retries them three times by default. You can configure the number of retries in the /etc/greenboot/greenboot.conf file by setting the GREENBOOT_MAX_BOOTS parameter to the desired number of retries.
- After all retries fail, greenboot automatically initiates a rollback if one is available. If a rollback is not available, the system log output shows that manual intervention is required.
- The 40_microshift_running_check.sh health check script for MicroShift is installed into this directory.
/etc/greenboot/check/wanted.d contains health scripts that are allowed to fail without causing the system to be rolled back.
- If any of these scripts fail, greenboot logs the failure but does not initiate a rollback.
/etc/greenboot/green.d contains scripts that run after greenboot has declared the start successful.
/etc/greenboot/red.d contains scripts that run after greenboot has declared the startup as failed, including the 40_microshift_pre_rollback.sh prerollback script. This script is executed right before a system rollback. The script performs MicroShift pod and OVN-Kubernetes cleanup to avoid potential conflicts after the system is rolled back to a previous version.

5.2. The MicroShift health check script
Copy link

The 40_microshift_running_check.sh health check script only performs validation of core MicroShift services. Install your customized workload health check scripts in the greenboot directories to ensure successful application operations after system updates. Scripts run in alphabetical order.

MicroShift health checks are listed in the following table:

Expand

Table 5.1. Validation statuses and outcome for MicroShift
Validation	Pass	Fail
Check that the script runs with `root` permissions	Next	`exit 0`
Check that the `microshift.service` is enabled	Next	`exit 0`
Wait for the `microshift.service` to be active (!failed)	Next	`exit 1`
Wait for Kubernetes API health endpoints to be working and receiving traffic	Next	`exit 1`
Wait for any pod to start	Next	`exit 1`
For each core namespace, wait for images to be pulled	Next	`exit 1`
For each core namespace, wait for pods to be ready	Next	`exit 1`
For each core namespace, check if pods are not restarting	`exit 0`	`exit 1`

5.2.1. Validation wait period
Copy link

The wait period in each validation is five minutes by default. After the wait period, if the validation has not succeeded, it is declared a failure. This wait period is incrementally increased by the base wait period after each boot in the verification loop.

You can override the base-time wait period by setting the MICROSHIFT_WAIT_TIMEOUT_SEC environment variable in the /etc/greenboot/greenboot.conf configuration file. For example, you can change the wait time to three minutes by resetting the value to 180 seconds, such as MICROSHIFT_WAIT_TIMEOUT_SEC=180.

5.3. Enabling systemd journal service data persistency
Copy link

The default configuration of the systemd journal service stores the data in the volatile /run/log/journal directory. To view system logs across system starts and restarts, you must enable log persistence and set limits on the maximal journal data size.

Procedure

Make the directory by running the following command:
```
sudo mkdir -p /etc/systemd/journald.conf.d
```
```
$ sudo mkdir -p /etc/systemd/journald.conf.d
```
Copy to Clipboard Toggle word wrap

Create the configuration file by running the following command:

cat <<EOF | sudo tee /etc/systemd/journald.conf.d/microshift.conf &>/dev/null
[Journal]
Storage=persistent
SystemMaxUse=1G
RuntimeMaxUse=1G
EOF

cat <<EOF | sudo tee /etc/systemd/journald.conf.d/microshift.conf &>/dev/null
[Journal]
Storage=persistent
SystemMaxUse=1G
RuntimeMaxUse=1G
EOF

Copy to Clipboard

Toggle word wrap

Edit the configuration file values for your size requirements.

5.4. Updates and third-party workloads
Copy link

Health checks are especially useful after an update. You can examine the output of greenboot health checks and determine whether the update was declared valid. This health check can help you determine if the system is working properly.

Health check scripts for updates are installed into the /etc/greenboot/check/required.d directory and are automatically executed during each system start. Exiting scripts with a nonzero status means the system start is declared as failed.

Important

Wait until after an update is declared valid before starting third-party workloads. If a rollback is performed after workloads start, you can lose data. Some third-party workloads create or update data on a device before an update is complete. Upon rollback, the file system reverts to its state before the update.

5.5. Checking the results of an update
Copy link

After a successful start, greenboot sets the variable boot_success= to 1 in GRUB. You can view the overall status of system health checks after an update in the system log by using the following procedure.

Procedure

To access the overall status of system health checks, run the following command:
```
sudo grub2-editenv - list | grep ^boot_success
```
```
$ sudo grub2-editenv - list | grep ^boot_success
```
Copy to Clipboard Toggle word wrap

Example output for a successful system start

boot_success=1

boot_success=1

Copy to Clipboard

Toggle word wrap

5.6. Accessing health check output in the system log
Copy link

You can manually access the output of health checks in the system log by using the following procedure.

Procedure

To access the results of a health check, run the following command:
```
sudo journalctl -o cat -u greenboot-healthcheck.service
```
```
$ sudo journalctl -o cat -u greenboot-healthcheck.service
```
Copy to Clipboard Toggle word wrap

Example output of a failed health check

...
...
Running Required Health Check Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=2
...
...
Waiting 300s for MicroShift service to be active and not failed
FAILURE
...
...

...
...
Running Required Health Check Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=2
...
...
Waiting 300s for MicroShift service to be active and not failed
FAILURE
...
...

Copy to Clipboard

Toggle word wrap

5.7. Accessing prerollback health check output in the system log
Copy link

You can access the output of health check scripts in the system log. For example, check the results of a prerollback script using the following procedure.

Procedure

To access the results of a prerollback script, run the following command:
```
sudo journalctl -o cat -u redboot-task-runner.service
```
```
$ sudo journalctl -o cat -u redboot-task-runner.service
```
Copy to Clipboard Toggle word wrap

Example output of a prerollback script

...
...
Running Red Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=0
The ostree status:
* rhel c0baa75d9b585f3dd989a9cf05f647eb7ca27ee0dbd4b94fe8c93ed3a4b9e4a5.0
    Version: 9.1
    origin: <unknown origin type>
  rhel 6869c1347b0e0ba1bbf0be750cdf32da5138a1fcbc5a4c6325ab9eb647b64663.0 (rollback)
    Version: 9.1
    origin refspec: edge:rhel/9/x86_64/edge
System rollback imminent - preparing MicroShift for a clean start
Stopping MicroShift services
Removing MicroShift pods
Killing conmon, pause and OVN processes
Removing OVN configuration
Finished greenboot Failure Scripts Runner.
Cleanup succeeded
Script '40_microshift_pre_rollback.sh' SUCCESS
FINISHED
redboot-task-runner.service: Deactivated successfully.

...
...
Running Red Scripts...
STARTED
GRUB boot variables:
boot_success=0
boot_indeterminate=0
boot_counter=0
The ostree status:
* rhel c0baa75d9b585f3dd989a9cf05f647eb7ca27ee0dbd4b94fe8c93ed3a4b9e4a5.0
    Version: 9.1
    origin: <unknown origin type>
  rhel 6869c1347b0e0ba1bbf0be750cdf32da5138a1fcbc5a4c6325ab9eb647b64663.0 (rollback)
    Version: 9.1
    origin refspec: edge:rhel/9/x86_64/edge
System rollback imminent - preparing MicroShift for a clean start
Stopping MicroShift services
Removing MicroShift pods
Killing conmon, pause and OVN processes
Removing OVN configuration
Finished greenboot Failure Scripts Runner.
Cleanup succeeded
Script '40_microshift_pre_rollback.sh' SUCCESS
FINISHED
redboot-task-runner.service: Deactivated successfully.

Copy to Clipboard

Toggle word wrap

5.8. Checking updates with a health check script
Copy link

Access the output of greenboot health check scripts in the system log after an update by using the following procedure.

Procedure

To access the result of update checks, run the following command:
```
sudo grub2-editenv - list | grep ^boot_success
```
```
$ sudo grub2-editenv - list | grep ^boot_success
```
Copy to Clipboard Toggle word wrap

Example output for a successful update

boot_success=1

boot_success=1

Copy to Clipboard

Toggle word wrap

If your command returns boot_success=0, either the greenboot health check is still running, or the update is a failure.

Chapter 5. The greenboot health check framework

5.1. How greenboot uses directories to run scripts
Copy link

5.1.1. Greenboot directories details
Copy link

5.2. The MicroShift health check script
Copy link

5.2.1. Validation wait period
Copy link

5.3. Enabling systemd journal service data persistency
Copy link

5.4. Updates and third-party workloads
Copy link

5.5. Checking the results of an update
Copy link

5.6. Accessing health check output in the system log
Copy link

5.7. Accessing prerollback health check output in the system log
Copy link

5.8. Checking updates with a health check script
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 5. The greenboot health check framework

5.1. How greenboot uses directories to run scriptsCopy linkLink copied to clipboard!

5.1.1. Greenboot directories detailsCopy linkLink copied to clipboard!

5.2. The MicroShift health check scriptCopy linkLink copied to clipboard!

5.2.1. Validation wait periodCopy linkLink copied to clipboard!

5.3. Enabling systemd journal service data persistencyCopy linkLink copied to clipboard!

5.4. Updates and third-party workloadsCopy linkLink copied to clipboard!

5.5. Checking the results of an updateCopy linkLink copied to clipboard!

5.6. Accessing health check output in the system logCopy linkLink copied to clipboard!

5.7. Accessing prerollback health check output in the system logCopy linkLink copied to clipboard!

5.8. Checking updates with a health check scriptCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. How greenboot uses directories to run scripts
Copy link

5.1.1. Greenboot directories details
Copy link

5.2. The MicroShift health check script
Copy link

5.2.1. Validation wait period
Copy link

5.3. Enabling systemd journal service data persistency
Copy link

5.4. Updates and third-party workloads
Copy link

5.5. Checking the results of an update
Copy link

5.6. Accessing health check output in the system log
Copy link

5.7. Accessing prerollback health check output in the system log
Copy link

5.8. Checking updates with a health check script
Copy link