7.3. Troubleshooting 2-site stretch cluster with Arbiter
Administrators can use this troubleshooting information to understand how to troubleshoot and fix their 2-site stretch cluster with arbiter environment.
- Problem
After performing complete zone failure and recovery, the workload pods are sometimes stuck in
ContainerCreatingstate with the any of the below errors:- MountDevice failed to create newCsiDriverClient: driver name openshift-storage.rbd.csi.ceph.com not found in the list of registered CSI drivers
- MountDevice failed for volume <volume_name> : rpc error: code = Aborted desc = an operation with the given Volume ID <volume_id> already exists
- MountVolume.SetUp failed for volume <volume_name> : rpc error: code = Internal desc = staging path <path> for volume <volume_id> is not a mountpoint
- Resolution
If the workload pods are stuck with any of the above mentioned errors, perform the following workarounds:
For ceph-fs workload stuck in
ContainerCreating:- Restart the nodes where the stuck pods are scheduled
- Delete these stuck pods
- Verify that the new pods are running
For ceph-rbd workload stuck in
ContainerCreatingthat do not self recover after sometime- Restart csi-rbd plugin pods in the nodes where the stuck pods are scheduled
- Verify that the new pods are running