1.4. Troubleshooting issues
A cluster administrator can monitor and troubleshoot the following OpenShift Container Platform component issues:
Installation issues: OpenShift Container Platform installation proceeds through various stages. You can perform the following:
- Monitor the installation stages.
- Determine at which stage installation issues occur.
- Investigate multiple installation issues.
- Gather logs from a failed installation.
Node issues: A cluster administrator can verify and troubleshoot node-related issues by reviewing the status, resource usage, and configuration of a node. You can query the following:
- Kubelet’s status on a node.
- Cluster node journal logs.
Crio issues: A cluster administrator can verify CRI-O container runtime engine status on each cluster node. If you experience container runtime issues, perform the following:
- Gather CRI-O journald unit logs.
- Cleaning CRI-O storage.
Operating system issues: OpenShift Container Platform runs on Red Hat Enterprise Linux CoreOS. If you experience operating system issues, you can investigate kernel crash procedures. Ensure the following:
- Enable kdump.
- Test the kdump configuration.
- Analyze a core dump.
Network issues: To troubleshoot Open vSwitch issues, a cluster administrator can perform the following:
- Configure the Open vSwitch log level temporarily.
- Configure the Open vSwitch log level permanently.
- Display Open vSwitch logs.
Operator issues: A cluster administrator can do the following to resolve Operator issues:
- Verify Operator subscription status.
- Check Operator pod health.
- Gather Operator logs.
Pod issues: A cluster administrator can troubleshoot pod-related issues by reviewing the status of a pod and completing the following:
- Review pod and container logs.
- Start debug pods with root access.
Source-to-image issues: A cluster administrator can observe the S2I stages to determine where in the S2I process a failure occurred. Gather the following to resolve Source-to-Image (S2I) issues:
- Source-to-Image diagnostic data.
- Application diagnostic data to investigate application failure.
Storage issues: A multi-attach storage error occurs when the mounting volume on a new node is not possible because the failed node cannot unmount the attached volume. A cluster administrator can do the following to resolve multi-attach storage issues:
- Enable multiple attachments by using RWX volumes.
- Recover or delete the failed node when using an RWO volume.
Monitoring issues: A cluster administrator can follow the procedures on the troubleshooting page for monitoring. If the metrics for your user-defined projects are unavailable or if Prometheus is consuming a lot of disk space, check the following:
- Investigate why user-defined metrics are unavailable.
- Determine why Prometheus is consuming a lot of disk space.
-
OpenShift CLI (
oc) issues: Investigate OpenShift CLI (oc) issues by increasing the log level.