Troubleshooting

Red Hat build of MicroShift 4.14

Troubleshooting common issues

Red Hat OpenShift Documentation Team

Legal Notice

Abstract

This document provides information about how to troubleshoot MicroShift.

Chapter 1. Checking which version you have installed
Copy link

To begin troubleshooting, determine which version of Red Hat build of MicroShift you have installed.

1.1. Checking the version using the command-line interface
Copy link

To begin troubleshooting, you must know your MicroShift version. One way to get this information is by using the CLI.

Procedure

Run the following command to check the version information:

microshift version

$ microshift version

Copy to Clipboard

Toggle word wrap

Example output

Red Hat build of MicroShift Version: 4.14-0.microshift-e6980e25
Base OCP Version: 4.14

Red Hat build of MicroShift Version: 4.14-0.microshift-e6980e25
Base OCP Version: 4.14

Copy to Clipboard

Toggle word wrap

1.2. Checking the MicroShift version using the API
Copy link

To begin troubleshooting, you must know your MicroShift version. One way to get this information is by using the API.

Procedure

To get the version number using the OpenShift CLI (oc), view the kube-public/microshift-version config map by running the following command:

oc get configmap -n kube-public microshift-version -o yaml

$ oc get configmap -n kube-public microshift-version -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: v1
data:
  major: "4"
  minor: "13"
  version: 4.13.8-0.microshift-fa441af87431
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-03T21:06:11Z"
  name: microshift-version
  namespace: kube-public

apiVersion: v1
data:
  major: "4"
  minor: "13"
  version: 4.13.8-0.microshift-fa441af87431
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-03T21:06:11Z"
  name: microshift-version
  namespace: kube-public

Copy to Clipboard

Toggle word wrap

Chapter 2. Troubleshooting data backup and restore
Copy link

To troubleshoot failed data backups and restorations, check the basics first, such as data paths, storage configuration, and storage capacity.

2.1. Backing up data failed
Copy link

Data backups are automatic on rpm-ostree systems. If you are not using an rpm-ostree system and attempted to create a manual backup, the following reasons can cause the backup to fail:

Not waiting several minutes after a system start to successfully stop MicroShift. The system must complete health checks and any other background processes before a back up can succeed.
If MicroShift stopped running because of an error, you cannot perform a backup of the data.
- Make sure the system is healthy.
- Stop it in a healthy state before attempting a backup.
If you do not have sufficient storage for the data, the backup fails. Ensure that you have enough storage for the MicroShift data.
If you do not have sufficient permissions, a backup can fail. Ensure you have the correct user permissions to create a backup and perform the required configurations.

2.2. Backup logs
Copy link

Logs print to the console during manual backups.
Logs are automatically generated for rpm-ostree system automated backups as part of the MicroShift journal logs. You can check the logs by running the following command:
```
sudo journalctl -u microshift
```
```
$ sudo journalctl -u microshift
```
Copy to Clipboard Toggle word wrap

2.3. Restoring data failed
Copy link

The restoration of data can fail for many reasons, including storage and permission issues. Mismatched data versions can cause failures when MicroShift restarts.

2.3.1. RPM-OSTree-based systems data restore failed
Copy link

Data restorations are automatic on rpm-ostree systems, but can fail, for example:

The only backups that are restored on rpm-ostree systems are backups from the current deployment or a rollback deployment. Backups are not taken on an unhealthy system.
- Only the latest backups that have corresponding deployments are retained. Outdated backups that do not have a matching deployment are automatically removed.
- Data is usually not restored from a newer version of MicroShift.
- Ensure that the data you are restoring follows same versioning pattern as the update path. For example, if the destination version of MicroShift is an older version than the version of the MicroShift data you are currently using, the restoration can fail.

2.3.2. RPM-based manual data restore failed
Copy link

If you are using an RPM system that is not rpm-ostree and tried to restore a manual backup, the following reasons can cause the restoration to fail:

If MicroShift stopped running because of an error, you cannot restore data.
- Make sure the system is healthy.
- Start it in a healthy state before attempting to restore data.
If you do not have enough storage space allocated for the incoming data, the restoration fails.
- Make sure that your current system storage is configured to accept the restored data.
You are attempting to restore data from a newer version of MicroShift.
- Ensure that the data you are restoring follows same versioning pattern as the update path. For example, if the destination version of MicroShift is an older version than the version of the MicroShift data you are attempting to use, the restoration can fail.

2.4. Storage migration failed
Copy link

Storage migration failures are typically caused by substantial changes in custom resources (CRs) from one MicroShift to the next. If a storage migration fails, there is usually an unresolvable discrepancy between versions that requires manual review.

Chapter 3. Troubleshooting a cluster
Copy link

To begin troubleshooting a Red Hat build of MicroShift cluster, first access the cluster status.

3.1. Checking the status of a cluster
Copy link

You can check the status of a MicroShift cluster or see active pods by running a simple command. Given in the following procedure are three commands you can use to check cluster status. You can choose to run one, two, or all commands to help you retrieve the information you need to troubleshoot the cluster.

Procedure

You can check the system status, which returns the cluster status, by running the following command:
```
sudo systemctl status microshift
```
```
$ sudo systemctl status microshift
```
Copy to Clipboard Toggle word wrap
If MicroShift is failing to start, this command returns the logs from the previous run.
Optional: You can view the logs by running the following command:
```
sudo journalctl -u microshift
```
```
$ sudo journalctl -u microshift
```
Copy to Clipboard Toggle word wrap

Note

The default configuration of the systemd journal service stores data in a volatile directory. To persist system logs across system starts and restarts, enable log persistence and set limits on the maximum journal data size.

Optional: If MicroShift is running, you can see active pods by entering the following command:
```
oc get pods -A
```
```
$ oc get pods -A
```
Copy to Clipboard Toggle word wrap

Chapter 4. Troubleshoot etcd
Copy link

To troubleshoot etcd and improve performance, configure the memory allowance for the service.

4.1. Configuring the memoryLimitMB value to set parameters for the etcd server
Copy link

By default, etcd will use as much memory as necessary to handle the load on the system. In some memory constrained systems, it might be necessary to limit the amount of memory etcd is allowed to use at a given time.

Procedure

Edit the /etc/microshift/config.yaml file to set the memoryLimitMB value.
```
etcd:
  memoryLimitMB: 128
```
```
etcd:
  memoryLimitMB: 128
```
Copy to Clipboard Toggle word wrap
Note
The minimum permissible value for memoryLimitMB on MicroShift is 128 MB. Values close to the minimum value are more likely to impact etcd performance. The lower the limit, the longer etcd takes to respond to queries. If the limit is too low or the etcd usage is high, queries time out.

Verification

After modifying the memoryLimitMB value in /etc/microshift/config.yaml, restart MicroShift by running the following command:
```
sudo systemctl restart microshift
```
```
$ sudo systemctl restart microshift
```
Copy to Clipboard Toggle word wrap
Verify the new memoryLimitMB value is in use by running the following command:
```
systemctl show --property=MemoryHigh microshift-etcd.scope
```
```
$ systemctl show --property=MemoryHigh microshift-etcd.scope
```
Copy to Clipboard Toggle word wrap

Chapter 5. Troubleshoot updates
Copy link

To troubleshoot MicroShift updates, use the following guide.

Important

You can only update MicroShift from one minor version to the next in sequence. For example, you must update 4.14 to 4.15.

5.1. Troubleshooting MicroShift updates
Copy link

In some cases, MicroShift might fail to update. In these events, it is helpful to understand failure types and how to troubleshoot them.

5.1.1. Update path is blocked by MicroShift version sequence
Copy link

MicroShift requires serial updates. Attempting to update MicroShift by skipping a minor version fails:

For example, if your current version is 4.14.5, but you try to update from that version to 4.16.0, the message, executable (4.16.0) is too recent compared to existing data (4.14.5): version difference is 2, maximum allowed difference is 1 appears and MicroShift fails to start.

In this example, you must first update 4.14.5 to a version of 4.15, and then you can upgrade to 4.16.0.

5.1.2. Update path is blocked by version incompatibility
Copy link

RPM dependency errors result if a MicroShift update is incompatible with the version of Red Hat Enterprise Linux for Edge (RHEL for Edge) or Red Hat Enterprise Linux (RHEL).

Check the following compatibility table:

Red Hat Device Edge release compatibility matrix

The two products of Red Hat Device Edge work together as a single solution for device-edge computing. To successfully pair your products, use the verified releases together for each as listed in the following table:

Expand

RHEL for Edge Version	MicroShift Version	MicroShift Release Status	MicroShift Supported Updates
9.2, 9.3	4.14	Generally Available	4.14.0→4.14.z and 4.14→4.15
9.2	4.13	Technology Preview	None
8.7	4.12	Developer Preview	None

Check the following update paths:

Red Hat build of MicroShift update paths

Generally Available Version 4.14.0 to 4.14.z on RHEL for Edge 9.2
Generally Available Version 4.14.0 to 4.14.z on RHEL 9.2

5.1.3. OSTree update failed
Copy link

If you updated on an OSTree system, the greenboot health check automatically logs and acts on system health. A failure can be indicated by a system rollback by greenboot. In cases where the update failed, but greenboot did not complete a system rollback, you can troubleshoot using the RHEL for Edge documentation linked in the "Additional resources" section that follows this content.

Checking the greenboot logs manually

Manually check the greenboot logs to verify system health by running the following command:

sudo systemctl restart --no-block greenboot-healthcheck && sudo journalctl -fu greenboot-healthcheck

$ sudo systemctl restart --no-block greenboot-healthcheck && sudo journalctl -fu greenboot-healthcheck

Copy to Clipboard

Toggle word wrap

5.1.4. Manual RPM update failed
Copy link

If you updated by using RPMs on a non-OSTree system, an update failure can be indicated by greenboot, but the health checks are only informative. Checking the system logs is the next step in troubleshooting a manual RPM update failure. You can use greenboot and sos report to check both the MicroShift update and the host system.

5.2. Checking journal logs after updates
Copy link

In some cases, MicroShift might fail to update. In these events, it is helpful to understand failure types and how to troubleshoot them. The journal logs can assist in diagnosing update failures.

Note

Procedure

Check the MicroShift journal logs by running the following command:
```
sudo journalctl -u microshift
```
```
$ sudo journalctl -u microshift
```
Copy to Clipboard Toggle word wrap
Check the greenboot journal logs by running the following command:
```
sudo journalctl -u greenboot-healthcheck
```
```
$ sudo journalctl -u greenboot-healthcheck
```
Copy to Clipboard Toggle word wrap
Check the journal logs for a boot of a specific service by running the following command:
```
sudo journalctl --boot <boot> -u <service-name>
```
```
$ sudo journalctl --boot <boot> -u <service-name>
```
Copy to Clipboard Toggle word wrap
Examining the comprehensive logs of a specific boot uses two steps. First list the boots, then select the one you want from the list you obtained:
- List the boots present in the journal logs by running the following command:
  $ sudo journalctl --list-boots
  Copy to Clipboard Toggle word wrap
- Check the journal logs for the boot you want by running the following command:
  $ sudo journalctl --boot <-my-boot-number>
  Copy to Clipboard Toggle word wrap

5.3. Checking the status of greenboot health checks
Copy link

Check the status of greenboot health checks before making changes to the system or during troubleshooting. You can use any of the following commands to help you ensure that greenboot scripts have finished running.

Procedure

To see a report of health check status, use the following command:
```
systemctl show --property=SubState --value greenboot-healthcheck.service
```
```
$ systemctl show --property=SubState --value greenboot-healthcheck.service
```
Copy to Clipboard Toggle word wrap
- An output of start means that greenboot checks are still running.
- An output of exited means that checks have passed and greenboot has exited. Greenboot runs the scripts in the green.d directory when the system is a healthy state.
- An output of failed means that checks have not passed. Greenboot runs the scripts in red.d directory when the system is in this state and might restart the system.
To see a report showing the numerical exit code of the service where 0 means success and non-zero values mean a failure occurred, use the following command:
```
systemctl show --property=ExecMainStatus --value greenboot-healthcheck.service
```
```
$ systemctl show --property=ExecMainStatus --value greenboot-healthcheck.service
```
Copy to Clipboard Toggle word wrap
To see a report showing a message about boot status, such as Boot Status is GREEN - Health Check SUCCESS, use the following command:
```
cat /run/motd.d/boot-status
```
```
$ cat /run/motd.d/boot-status
```
Copy to Clipboard Toggle word wrap

Chapter 6. Checking audit logs
Copy link

You can use audit logs to identify pod security violations.

6.1. Identifying pod security violations through audit logs
Copy link

You can identify pod security admission violations on a workload by viewing the server audit logs. The following procedure shows you how to access the audit logs and parse them to find pod security admission violations in a workload.

Prerequisites

You have installed jq.
You have access to the cluster as a user with the cluster-admin role.

Procedure

To retrieve the node name, run the following command:

<node_name>=$(oc get node -ojsonpath='{.items[0].metadata.name}')

$ <node_name>=$(oc get node -ojsonpath='{.items[0].metadata.name}')

Copy to Clipboard

Toggle word wrap

To view the audit logs, run the following command:

oc adm node-logs <node_name> --path=kube-apiserver/

$ oc adm node-logs <node_name> --path=kube-apiserver/

Copy to Clipboard

Toggle word wrap

Example output

rhel-92.lab.local audit-2023-08-18T18-25-41.663.log
rhel-92.lab.local audit-2023-08-19T11-21-29.225.log
rhel-92.lab.local audit-2023-08-20T04-16-09.622.log
rhel-92.lab.local audit-2023-08-20T21-11-41.163.log
rhel-92.lab.local audit-2023-08-21T14-06-10.402.log
rhel-92.lab.local audit-2023-08-22T06-35-10.392.log
rhel-92.lab.local audit-2023-08-22T23-26-27.667.log
rhel-92.lab.local audit-2023-08-23T16-52-15.456.log
rhel-92.lab.local audit-2023-08-24T07-31-55.238.log

rhel-92.lab.local audit-2023-08-18T18-25-41.663.log
rhel-92.lab.local audit-2023-08-19T11-21-29.225.log
rhel-92.lab.local audit-2023-08-20T04-16-09.622.log
rhel-92.lab.local audit-2023-08-20T21-11-41.163.log
rhel-92.lab.local audit-2023-08-21T14-06-10.402.log
rhel-92.lab.local audit-2023-08-22T06-35-10.392.log
rhel-92.lab.local audit-2023-08-22T23-26-27.667.log
rhel-92.lab.local audit-2023-08-23T16-52-15.456.log
rhel-92.lab.local audit-2023-08-24T07-31-55.238.log

Copy to Clipboard

Toggle word wrap

To parse the affected audit logs, enter the following command:

oc adm node-logs <node_name> --path=kube-apiserver/audit.log \
  | jq -r 'select((.annotations["pod-security.kubernetes.io/audit-violations"] != null) and (.objectRef.resource=="pods")) | .objectRef.namespace + " " + .objectRef.name + " " + .objectRef.resource' \
  | sort | uniq -c

$ oc adm node-logs <node_name> --path=kube-apiserver/audit.log \
  | jq -r 'select((.annotations["pod-security.kubernetes.io/audit-violations"] != null) and (.objectRef.resource=="pods")) | .objectRef.namespace + " " + .objectRef.name + " " + .objectRef.resource' \
  | sort | uniq -c

Copy to Clipboard

Toggle word wrap

Chapter 7. Responsive restarts and security certificates
Copy link

MicroShift responds to system configuration changes and restarts after alterations are detected, including IP address changes, clock adjustments, and security certificate age.

7.1. IP address changes or clock adjustments
Copy link

MicroShift depends on device IP addresses and system-wide clock settings to remain consistent during its runtime. However, these settings may occasionally change on edge devices, such as DHCP or Network Time Protocol (NTP) updates.

When such changes occur, some MicroShift components may stop functioning properly. To mitigate this situation, MicroShift monitors the IP address and system time and restarts if either setting change is detected.

The threshold for clock changes is a time adjustment of greater than 10 seconds in either direction. Smaller drifts on regular time adjustments performed by the Network Time Protocol (NTP) service do not cause a restart.

7.2. Security certificate lifetime
Copy link

MicroShift certificates are separated into two basic groups:

Short-lived certificates having certificate validity of one year.
Long-lived certificates having certificate validity of 10 years.

Most server or leaf certificates are short-lived.

An example of a long-lived certificate is the client certificate for system:admin user authentication, or the certificate of the signer of the kube-apiserver external serving certificate.

7.2.1. Certificate rotation
Copy link

Certificates that are expired or close to their expiration dates need to be rotated to ensure continued MicroShift operation. When MicroShift restarts for any reason, certificates that are close to expiring are rotated. A certificate that is set to expire imminently, or has expired, can cause an automatic MicroShift restart to perform a rotation.

Important

If the rotated certificate is a MicroShift certificate authority (CA), then all of the signed certificates rotate. If you created any custom CAs, ensure the CAs manually rotate.

7.2.1.1. Short-term certificates
Copy link

The following situations describe MicroShift actions during short-term certificate lifetimes:

No rotation:
1. When a short-term certificate is up to 5 months old, no rotation occurs.
Rotation at restart:
1. When a short-term certificate is 5 to 8 months old, it is rotated when MicroShift starts or restarts.
Automatic restart for rotation:
1. When a short-term certificate is more than 8 months old, MicroShift can automatically restart to rotate and apply a new certificate.

7.2.1.2. Long-term certificates
Copy link

The following situations describe MicroShift actions during long-term certificate lifetimes:

No rotation:
1. When a long-term certificate is up to 8.5 years old, no rotation occurs.
Rotation at restart:
1. When a long-term certificate is 8.5 to 9 years old, it is rotated when MicroShift starts or restarts.
Automatic restart for rotation:
1. When a long-term certificate is more than 9 years old, MicroShift might automatically restart so that it can rotate and apply a new certificate.

Legal Notice
Copy link

The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Troubleshooting