Chapter 6. Troubleshooting alerts and errors in OpenShift Container Storage
6.1. Resolving alerts and errors
Red Hat OpenShift Container Storage can detect and automatically resolve a number of common failure scenarios. However, some problems require administrator intervention.
To know the errors currently firing, check one of the following locations:
-
Monitoring
Alerting Firing option -
Home
Overview Cluster tab -
Storage
Overview Block and File tab -
Storage
Overview Object tab
Copy the error displayed and search it in the following section to know its severity and resolution:
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Inspect the user interface and log, and verify if an update is in progress.
|
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Inspect the user interface and log, and verify if an update is in progress.
|
Name:
Message:
Description: Severity: Crtical Resolution: Fix Procedure: Remove unnecessary data or expand the cluster. |
Name:
Fixed:
Description: Severity: Warning Resolution: Fix Procedure: Remove unnecessary data or expand the cluster. |
Name:
Message:
Description: Severity: Warning Resolution: Workaround Procedure: Resolving NooBaa Bucket Error State |
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Resolving NooBaa Bucket Error State |
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Resolving NooBaa Bucket Error State |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Workaround Procedure: Resolving NooBaa Bucket Error State |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message:
Description: Severity: Warning Resolution: Fix |
Name:
Message: Description: `Minimum required replicas for storage metadata service not available. Might affect the working of storage cluster.` Severity: Warning Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
6.2. Resolving NooBaa Bucket Error State
Procedure
-
Log in to OpenShift Web Console and click Storage
Overview Object. - In the Details card, click the link under System Name field.
- In the left pane, click Buckets option and search for the bucket in error state. If the bucket in error state is a namespace bucket, be sure to click the Namespace Buckets pane.
- Click on it’s Bucket Name. Error encountered in bucket is displayed.
Depending on the specific error of the bucket, perform one or both of the following:
For space related errors:
- In the left pane, click Resources option.
- Click on the resource in error state.
- Scale the resource by adding more agents.
For resource health errors:
- In the left pane, click Resources option.
- Click on the resource in error state.
- Connectivity error means the backing service is not available and needs to be restored.
- For access/permissions errors, update the connection’s Access Key and Secret Key.
6.3. Resolving NooBaa Bucket Exceeding Quota State
To resolve A NooBaa Bucket Is In Exceeding Quota State error perform one of the following:
- Cleanup some of the data on the bucket.
Increase the bucket quota by performing the following steps:
-
Log in to OpenShift Web Console and click Storage
Overview Object. - In the Details card, click the link under System Name field.
- In the left pane, click Buckets option and search for the bucket in error state.
- Click on it’s Bucket Name. Error encountered in bucket is displayed.
-
Click Bucket Policies
Edit Quota and increase the quota.
-
Log in to OpenShift Web Console and click Storage
6.4. Resolving NooBaa Bucket Capacity or Quota State
Procedure
-
Log in to OpenShift Web Console and click Storage
Overview Object. - In the Details card, click the link under System Name field.
- In the left pane, click Resources option and search for the PV pool resource.
- For the PV pool resource with low capacity status, click on it’s Resource Name.
- Edit the pool configuration and increase the number of agents.
6.5. Recovering pods
When a first node (say NODE1
) goes to NotReady state because of some issue, the hosted pods that are using PVC with ReadWriteOnce (RWO) access mode try to move to the second node (say NODE2
) but get stuck due to multi-attach error. In such a case, you can recover MON, OSD, and application pods by using the following steps.
Procedure
-
Power off
NODE1
(from AWS or vSphere side) and ensure thatNODE1
is completely down. Force delete the pods on
NODE1
by using the following command:$ oc delete pod <pod-name> --grace-period=0 --force
6.6. Recovering from EBS volume detach
When an OSD or MON elastic block storage (EBS) volume where the OSD disk resides is detached from the worker Amazon EC2 instance, the volume gets reattached automatically within one or two minutes. However, the OSD pod gets into a CrashLoopBackOff
state. To recover and bring back the pod to Running
state, you must restart the EC2 instance.
6.7. Enabling and disabling debug logs for rook-ceph-operator
Enable the debug logs for the rook-ceph-operator to obtain information about failures that help in troubleshooting issues.
Procedure
- Enabling the debug logs
Edit the configmap of the rook-ceph-operator.
$ oc edit configmap rook-ceph-operator-config
Add the
ROOK_LOG_LEVEL: DEBUG
parameter in therook-ceph-operator-config
yaml file to enable the debug logs for rook-ceph-operator.… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: DEBUG
Now, the rook-ceph-operator logs consist of the debug information.
- Disabling the debug logs
Edit the configmap of the rook-ceph-operator.
$ oc edit configmap rook-ceph-operator-config
Add the
ROOK_LOG_LEVEL: INFO
parameter in therook-ceph-operator-config
yaml file to disable the debug logs for rook-ceph-operator.… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: INFO