Troubleshooting
Troubleshooting common issues
Abstract
Chapter 1. Checking which version you have installed Copy linkLink copied to clipboard!
To begin troubleshooting, determine which version of Red Hat build of MicroShift you have installed.
1.1. Checking the version using the command-line interface Copy linkLink copied to clipboard!
To begin troubleshooting, you must know your MicroShift version. One way to get this information is by using the CLI.
Procedure
Run the following command to check the version information:
microshift version
$ microshift versionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Red Hat build of MicroShift Version: 4.14-0.microshift-e6980e25 Base OCP Version: 4.14
Red Hat build of MicroShift Version: 4.14-0.microshift-e6980e25 Base OCP Version: 4.14Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2. Checking the MicroShift version using the API Copy linkLink copied to clipboard!
To begin troubleshooting, you must know your MicroShift version. One way to get this information is by using the API.
Procedure
To get the version number using the OpenShift CLI (
oc), view thekube-public/microshift-versionconfig map by running the following command:oc get configmap -n kube-public microshift-version -o yaml
$ oc get configmap -n kube-public microshift-version -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 2. Troubleshooting data backup and restore Copy linkLink copied to clipboard!
To troubleshoot failed data backups and restorations, check the basics first, such as data paths, storage configuration, and storage capacity.
2.1. Backing up data failed Copy linkLink copied to clipboard!
Data backups are automatic on rpm-ostree systems. If you are not using an rpm-ostree system and attempted to create a manual backup, the following reasons can cause the backup to fail:
- Not waiting several minutes after a system start to successfully stop MicroShift. The system must complete health checks and any other background processes before a back up can succeed.
If MicroShift stopped running because of an error, you cannot perform a backup of the data.
- Make sure the system is healthy.
- Stop it in a healthy state before attempting a backup.
- If you do not have sufficient storage for the data, the backup fails. Ensure that you have enough storage for the MicroShift data.
- If you do not have sufficient permissions, a backup can fail. Ensure you have the correct user permissions to create a backup and perform the required configurations.
2.2. Backup logs Copy linkLink copied to clipboard!
- Logs print to the console during manual backups.
Logs are automatically generated for
rpm-ostreesystem automated backups as part of the MicroShift journal logs. You can check the logs by running the following command:sudo journalctl -u microshift
$ sudo journalctl -u microshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow
2.3. Restoring data failed Copy linkLink copied to clipboard!
The restoration of data can fail for many reasons, including storage and permission issues. Mismatched data versions can cause failures when MicroShift restarts.
2.3.1. RPM-OSTree-based systems data restore failed Copy linkLink copied to clipboard!
Data restorations are automatic on rpm-ostree systems, but can fail, for example:
The only backups that are restored on
rpm-ostreesystems are backups from the current deployment or a rollback deployment. Backups are not taken on an unhealthy system.- Only the latest backups that have corresponding deployments are retained. Outdated backups that do not have a matching deployment are automatically removed.
- Data is usually not restored from a newer version of MicroShift.
- Ensure that the data you are restoring follows same versioning pattern as the update path. For example, if the destination version of MicroShift is an older version than the version of the MicroShift data you are currently using, the restoration can fail.
2.3.2. RPM-based manual data restore failed Copy linkLink copied to clipboard!
If you are using an RPM system that is not rpm-ostree and tried to restore a manual backup, the following reasons can cause the restoration to fail:
If MicroShift stopped running because of an error, you cannot restore data.
- Make sure the system is healthy.
- Start it in a healthy state before attempting to restore data.
If you do not have enough storage space allocated for the incoming data, the restoration fails.
- Make sure that your current system storage is configured to accept the restored data.
You are attempting to restore data from a newer version of MicroShift.
- Ensure that the data you are restoring follows same versioning pattern as the update path. For example, if the destination version of MicroShift is an older version than the version of the MicroShift data you are attempting to use, the restoration can fail.
2.4. Storage migration failed Copy linkLink copied to clipboard!
Storage migration failures are typically caused by substantial changes in custom resources (CRs) from one MicroShift to the next. If a storage migration fails, there is usually an unresolvable discrepancy between versions that requires manual review.
Chapter 3. Troubleshooting a cluster Copy linkLink copied to clipboard!
To begin troubleshooting a Red Hat build of MicroShift cluster, first access the cluster status.
3.1. Checking the status of a cluster Copy linkLink copied to clipboard!
You can check the status of a MicroShift cluster or see active pods by running a simple command. Given in the following procedure are three commands you can use to check cluster status. You can choose to run one, two, or all commands to help you retrieve the information you need to troubleshoot the cluster.
Procedure
You can check the system status, which returns the cluster status, by running the following command:
sudo systemctl status microshift
$ sudo systemctl status microshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow If MicroShift is failing to start, this command returns the logs from the previous run.
Optional: You can view the logs by running the following command:
sudo journalctl -u microshift
$ sudo journalctl -u microshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow
The default configuration of the systemd journal service stores data in a volatile directory. To persist system logs across system starts and restarts, enable log persistence and set limits on the maximum journal data size.
Optional: If MicroShift is running, you can see active pods by entering the following command:
oc get pods -A
$ oc get pods -ACopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 4. Troubleshoot etcd Copy linkLink copied to clipboard!
To troubleshoot etcd and improve performance, configure the memory allowance for the service.
4.1. Configuring the memoryLimitMB value to set parameters for the etcd server Copy linkLink copied to clipboard!
By default, etcd will use as much memory as necessary to handle the load on the system. In some memory constrained systems, it might be necessary to limit the amount of memory etcd is allowed to use at a given time.
Procedure
Edit the
/etc/microshift/config.yamlfile to set thememoryLimitMBvalue.etcd: memoryLimitMB: 128
etcd: memoryLimitMB: 128Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe minimum permissible value for
memoryLimitMBon MicroShift is 128 MB. Values close to the minimum value are more likely to impact etcd performance. The lower the limit, the longer etcd takes to respond to queries. If the limit is too low or the etcd usage is high, queries time out.
Verification
After modifying the
memoryLimitMBvalue in/etc/microshift/config.yaml, restart MicroShift by running the following command:sudo systemctl restart microshift
$ sudo systemctl restart microshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the new
memoryLimitMBvalue is in use by running the following command:systemctl show --property=MemoryHigh microshift-etcd.scope
$ systemctl show --property=MemoryHigh microshift-etcd.scopeCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 5. Troubleshoot updates Copy linkLink copied to clipboard!
To troubleshoot MicroShift updates, use the following guide.
You can only update MicroShift from one minor version to the next in sequence. For example, you must update 4.14 to 4.15.
5.1. Troubleshooting MicroShift updates Copy linkLink copied to clipboard!
In some cases, MicroShift might fail to update. In these events, it is helpful to understand failure types and how to troubleshoot them.
5.1.1. Update path is blocked by MicroShift version sequence Copy linkLink copied to clipboard!
MicroShift requires serial updates. Attempting to update MicroShift by skipping a minor version fails:
-
For example, if your current version is
4.14.5, but you try to update from that version to4.16.0, the message,executable (4.16.0) is too recent compared to existing data (4.14.5): version difference is 2, maximum allowed difference is 1appears and MicroShift fails to start.
In this example, you must first update 4.14.5 to a version of 4.15, and then you can upgrade to 4.16.0.
5.1.2. Update path is blocked by version incompatibility Copy linkLink copied to clipboard!
RPM dependency errors result if a MicroShift update is incompatible with the version of Red Hat Enterprise Linux for Edge (RHEL for Edge) or Red Hat Enterprise Linux (RHEL).
Check the following compatibility table:
Red Hat Device Edge release compatibility matrix
The two products of Red Hat Device Edge work together as a single solution for device-edge computing. To successfully pair your products, use the verified releases together for each as listed in the following table:
| RHEL for Edge Version | MicroShift Version | MicroShift Release Status | MicroShift Supported Updates |
| 9.2, 9.3 | 4.14 | Generally Available | 4.14.0→4.14.z and 4.14→4.15 |
| 9.2 | 4.13 | Technology Preview | None |
| 8.7 | 4.12 | Developer Preview | None |
Check the following update paths:
Red Hat build of MicroShift update paths
- Generally Available Version 4.14.0 to 4.14.z on RHEL for Edge 9.2
- Generally Available Version 4.14.0 to 4.14.z on RHEL 9.2
5.1.3. OSTree update failed Copy linkLink copied to clipboard!
If you updated on an OSTree system, the greenboot health check automatically logs and acts on system health. A failure can be indicated by a system rollback by greenboot. In cases where the update failed, but greenboot did not complete a system rollback, you can troubleshoot using the RHEL for Edge documentation linked in the "Additional resources" section that follows this content.
- Checking the greenboot logs manually
Manually check the greenboot logs to verify system health by running the following command:
sudo systemctl restart --no-block greenboot-healthcheck && sudo journalctl -fu greenboot-healthcheck
$ sudo systemctl restart --no-block greenboot-healthcheck && sudo journalctl -fu greenboot-healthcheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow
5.1.4. Manual RPM update failed Copy linkLink copied to clipboard!
If you updated by using RPMs on a non-OSTree system, an update failure can be indicated by greenboot, but the health checks are only informative. Checking the system logs is the next step in troubleshooting a manual RPM update failure. You can use greenboot and sos report to check both the MicroShift update and the host system.
5.2. Checking journal logs after updates Copy linkLink copied to clipboard!
In some cases, MicroShift might fail to update. In these events, it is helpful to understand failure types and how to troubleshoot them. The journal logs can assist in diagnosing update failures.
The default configuration of the systemd journal service stores data in a volatile directory. To persist system logs across system starts and restarts, enable log persistence and set limits on the maximum journal data size.
Procedure
Check the MicroShift journal logs by running the following command:
sudo journalctl -u microshift
$ sudo journalctl -u microshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the greenboot journal logs by running the following command:
sudo journalctl -u greenboot-healthcheck
$ sudo journalctl -u greenboot-healthcheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the journal logs for a boot of a specific service by running the following command:
sudo journalctl --boot <boot> -u <service-name>
$ sudo journalctl --boot <boot> -u <service-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Examining the comprehensive logs of a specific boot uses two steps. First list the boots, then select the one you want from the list you obtained:
List the boots present in the journal logs by running the following command:
sudo journalctl --list-boots
$ sudo journalctl --list-bootsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the journal logs for the boot you want by running the following command:
sudo journalctl --boot <-my-boot-number>
$ sudo journalctl --boot <-my-boot-number>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.3. Checking the status of greenboot health checks Copy linkLink copied to clipboard!
Check the status of greenboot health checks before making changes to the system or during troubleshooting. You can use any of the following commands to help you ensure that greenboot scripts have finished running.
Procedure
To see a report of health check status, use the following command:
systemctl show --property=SubState --value greenboot-healthcheck.service
$ systemctl show --property=SubState --value greenboot-healthcheck.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
An output of
startmeans that greenboot checks are still running. -
An output of
exitedmeans that checks have passed and greenboot has exited. Greenboot runs the scripts in thegreen.ddirectory when the system is a healthy state. -
An output of
failedmeans that checks have not passed. Greenboot runs the scripts inred.ddirectory when the system is in this state and might restart the system.
-
An output of
To see a report showing the numerical exit code of the service where
0means success and non-zero values mean a failure occurred, use the following command:systemctl show --property=ExecMainStatus --value greenboot-healthcheck.service
$ systemctl show --property=ExecMainStatus --value greenboot-healthcheck.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow To see a report showing a message about boot status, such as
Boot Status is GREEN - Health Check SUCCESS, use the following command:cat /run/motd.d/boot-status
$ cat /run/motd.d/boot-statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 6. Checking audit logs Copy linkLink copied to clipboard!
You can use audit logs to identify pod security violations.
6.1. Identifying pod security violations through audit logs Copy linkLink copied to clipboard!
You can identify pod security admission violations on a workload by viewing the server audit logs. The following procedure shows you how to access the audit logs and parse them to find pod security admission violations in a workload.
Prerequisites
-
You have installed
jq. -
You have access to the cluster as a user with the
cluster-adminrole.
Procedure
To retrieve the node name, run the following command:
<node_name>=$(oc get node -ojsonpath='{.items[0].metadata.name}')$ <node_name>=$(oc get node -ojsonpath='{.items[0].metadata.name}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow To view the audit logs, run the following command:
oc adm node-logs <node_name> --path=kube-apiserver/
$ oc adm node-logs <node_name> --path=kube-apiserver/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To parse the affected audit logs, enter the following command:
oc adm node-logs <node_name> --path=kube-apiserver/audit.log \ | jq -r 'select((.annotations["pod-security.kubernetes.io/audit-violations"] != null) and (.objectRef.resource=="pods")) | .objectRef.namespace + " " + .objectRef.name + " " + .objectRef.resource' \ | sort | uniq -c
$ oc adm node-logs <node_name> --path=kube-apiserver/audit.log \ | jq -r 'select((.annotations["pod-security.kubernetes.io/audit-violations"] != null) and (.objectRef.resource=="pods")) | .objectRef.namespace + " " + .objectRef.name + " " + .objectRef.resource' \ | sort | uniq -cCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 7. Responsive restarts and security certificates Copy linkLink copied to clipboard!
MicroShift responds to system configuration changes and restarts after alterations are detected, including IP address changes, clock adjustments, and security certificate age.
7.1. IP address changes or clock adjustments Copy linkLink copied to clipboard!
MicroShift depends on device IP addresses and system-wide clock settings to remain consistent during its runtime. However, these settings may occasionally change on edge devices, such as DHCP or Network Time Protocol (NTP) updates.
When such changes occur, some MicroShift components may stop functioning properly. To mitigate this situation, MicroShift monitors the IP address and system time and restarts if either setting change is detected.
The threshold for clock changes is a time adjustment of greater than 10 seconds in either direction. Smaller drifts on regular time adjustments performed by the Network Time Protocol (NTP) service do not cause a restart.
7.2. Security certificate lifetime Copy linkLink copied to clipboard!
MicroShift certificates are separated into two basic groups:
- Short-lived certificates having certificate validity of one year.
- Long-lived certificates having certificate validity of 10 years.
Most server or leaf certificates are short-lived.
An example of a long-lived certificate is the client certificate for system:admin user authentication, or the certificate of the signer of the kube-apiserver external serving certificate.
7.2.1. Certificate rotation Copy linkLink copied to clipboard!
Certificates that are expired or close to their expiration dates need to be rotated to ensure continued MicroShift operation. When MicroShift restarts for any reason, certificates that are close to expiring are rotated. A certificate that is set to expire imminently, or has expired, can cause an automatic MicroShift restart to perform a rotation.
If the rotated certificate is a MicroShift certificate authority (CA), then all of the signed certificates rotate. If you created any custom CAs, ensure the CAs manually rotate.
7.2.1.1. Short-term certificates Copy linkLink copied to clipboard!
The following situations describe MicroShift actions during short-term certificate lifetimes:
No rotation:
- When a short-term certificate is up to 5 months old, no rotation occurs.
Rotation at restart:
- When a short-term certificate is 5 to 8 months old, it is rotated when MicroShift starts or restarts.
Automatic restart for rotation:
- When a short-term certificate is more than 8 months old, MicroShift can automatically restart to rotate and apply a new certificate.
7.2.1.2. Long-term certificates Copy linkLink copied to clipboard!
The following situations describe MicroShift actions during long-term certificate lifetimes:
No rotation:
- When a long-term certificate is up to 8.5 years old, no rotation occurs.
Rotation at restart:
- When a long-term certificate is 8.5 to 9 years old, it is rotated when MicroShift starts or restarts.
Automatic restart for rotation:
- When a long-term certificate is more than 9 years old, MicroShift might automatically restart so that it can rotate and apply a new certificate.