OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 5. Recovering applications with RWO storage
Applications that use RWO storage (ReadWriteOnce) have a known behavior described in this Kubernetes issue. Because of this issue, if there is a data zone failure any application pods in that zone mounting RWO volumes (for example: cephrbd based volumes) are stuck with Terminating status after 6-8 minutes and will not be recreated on the active zone without manual intervention.
Check the OpenShift Container Platform nodes with a status of NotReady. It may have an issue that prevents them from communicating with the OpenShift control plane. They may still be performing IO operations against persistent volumes in-spite of this communication issue.
If two pods are concurrently writing to the same RWO volume, there is a risk of data corruption. Some measure must be taken to ensure that processes on the NotReady node are terminated or blocked until they can be terminated.
- Using an out of band management system to power off a node, with confirmation, would be an example of ensuring process termination.
Withdrawing a network route that is used by nodes at a failed site to communicate with storage would be another solution.
NoteBefore restoring service to the failed zone or nodes, there must be confirmation that all pods with persistent volumes have terminated successfully.
To get the Terminating pods to recreate on the active zone, you can either force delete the pod or delete the finalizer on the associated PV. Once one of these two actions are completed, the application pod should recreate on the active zone and successfully mount its RWO storage.
- Force delete the pod
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated.
oc delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
$ oc delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <PODNAME>- Is the name of the pod
<NAMESPACE>- Is the project namespace
- Deleting the finalizer on the associated PV
Find the associated PV for the Persistent Volume Claim (PVC) that is mounted by the
Terminatingpod and delete the finalizer using theoc patchcommand.oc patch -n openshift-storage pv/<PV_NAME> -p '{"metadata":{"finalizers":[]}}' --type=merge$ oc patch -n openshift-storage pv/<PV_NAME> -p '{"metadata":{"finalizers":[]}}' --type=mergeCopy to Clipboard Copied! Toggle word wrap Toggle overflow <PV_NAME>Is the name of the PV
An easy way to find the associated PV is to describe the Terminating pod. If you see a multi-attach warning, it should have the PV names in the warning (for example, pvc-0595a8d2-683f-443b-aee0-6e547f5f5a7c).
oc describe pod <PODNAME> --namespace <NAMESPACE>
$ oc describe pod <PODNAME> --namespace <NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <PODNAME>- Is the name of the pod
<NAMESPACE>Is the project namespace
Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow