OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
Este conteúdo não está disponível no idioma selecionado.
Chapter 10. Restoring the monitor pods in OpenShift Container Storage
Restore the monitor pods if all three of them go down, and when OpenShift Container Storage is not able to recover the monitor pods automatically.
Procedure
Scale down the
rook-ceph-operatorandocs operatordeployments.oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
# oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale deployment ocs-operator --replicas=0 -n openshift-storage
# oc scale deployment ocs-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a backup of all deployments in the
openshift-storagenamespace.mkdir backup
# mkdir backupCopy to Clipboard Copied! Toggle word wrap Toggle overflow cd backup
# cd backupCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc project openshift-storage
# oc project openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done# for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the OSD deployments to remove the
livenessProbeparameter, and run it with the command parameter assleep.for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done# for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the
monstorecluster map from all the OSDs.Create the
recover_mon.shscript.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
recover_mon.shscript.chmod +x recover_mon.sh
# chmod +x recover_mon.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow ./recover_mon.sh
# ./recover_mon.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Patch the MON deployments, and run it with the command parameter as
sleep.Edit the MON deployments.
for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done# for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the MON deployments to increase the
initialDelaySeconds.oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Copy the previously retrieved
monstoreto the mon-a pod.oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
# oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate into the MON pod and change the ownership of the retrieved
monstore.oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /tmp/monstore
# chown -R ceph:ceph /tmp/monstoreCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the appropriate capabilities before rebuilding the MON DB.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp /etc/ceph/keyring-store/keyring /tmp/keyring
# cp /etc/ceph/keyring-store/keyring /tmp/keyringCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the keyring of all the other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI provisioners) from its respective secrets.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example keyring file,
/etc/ceph/ceph.client.admin.keyring:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Important-
For
client.csirelated keyring, add the defaultcapsafter fetching the key from its respective OpenShift Container Storage secret. - OSD keyring is added automatically post recovery.
-
For
Navigate into the mon-a pod.
Verify that the
monstorehasmonmap.ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
# ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow monmaptool /tmp/monmap --print
# monmaptool /tmp/monmap --printCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the
monmapis missing then create a new monmap.monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
# monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mon-a-id>- Is the ID of the mon-a pod
<mon-a-ip>- Is the IP address of the mon-a pod
<mon-b-id>- Is the ID of the mon-b pod
<mon-b-ip>- Is the IP address of the mon-b pod
<mon-c-id>- Is the ID of the mon-c pod
<mon-c-ip>- Is the IP address of the mon-c pod
<fsid>- Is the file system ID
Verify the
monmap.monmaptool /root/monmap --print
# monmaptool /root/monmap --printCopy to Clipboard Copied! Toggle word wrap Toggle overflow Import the
monmap.ImportantUse the previously created keyring file.
ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
# ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /tmp/monstore
# chown -R ceph:ceph /tmp/monstoreCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a backup of the old
store.dbfile.mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the rebuild
store.dbfile to themonstoredirectory.mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
# mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow After rebuilding the
monstoredirectory, copy thestore.dbfile from local to the rest of the MON pods.oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
# oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
# oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <id>- Is the ID of the MON pod
Navigate into the rest of the MON pods and change the ownership of the copied
monstore.oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow <id>- Is the ID of the MON pod
Revert the patched changes.
For MON deployments:
oc replace --force -f <mon-deployment.yaml>
# oc replace --force -f <mon-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mon-deployment.yaml>- Is the MON deployment yaml file
For OSD deployments:
oc replace --force -f <osd-deployment.yaml>
# oc replace --force -f <osd-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <osd-deployment.yaml>- Is the OSD deployment yaml file
For MGR deployments:
oc replace --force -f <mgr-deployment.yaml>
# oc replace --force -f <mgr-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mgr-deployment.yaml>Is the MGR deployment yaml file
ImportantEnsure that the MON, MGR and OSD pods are up and running.
Scale up the
rook-ceph-operatorandocs-operatordeployments.oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage scale deployment ocs-operator --replicas=1
# oc -n openshift-storage scale deployment ocs-operator --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Restoring the CephFS
Check the Ceph status to confirm that CephFS is running.
ceph -s
# ceph -s
Example output:
If the filesystem is offline or MDS service is missing, perform the following steps:
Scale down the
rook-ceph-operatorandocs operatordeployments.oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
# oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale deployment ocs-operator --replicas=0 -n openshift-storage
# oc scale deployment ocs-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the MDS deployments to remove the
livenessProbeparameter and run it with the command parameter assleep.for i in $(oc get deployment -l app=rook-ceph-mds -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mds", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done# for i in $(oc get deployment -l app=rook-ceph-mds -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mds", "command": ["sleep", "infinity"], "args": []}]}}}}' ; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Recover the CephFS.
ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it
# ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-itCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the
resetcommand fails, force create the default filesystem with the data and metadata pools, and then reset it.NoteThe
resetcommand might fail if thecephfilesystemis missing.ceph fs new ocs-storagecluster-cephfilesystem ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephfilesystem-data0 --force
# ceph fs new ocs-storagecluster-cephfilesystem ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephfilesystem-data0 --forceCopy to Clipboard Copied! Toggle word wrap Toggle overflow ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it
# ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-itCopy to Clipboard Copied! Toggle word wrap Toggle overflow Replace the MDS deployments.
oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-a.yaml
# oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-a.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-b.yaml
# oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-b.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Scale up the
rook-ceph-operatorandocs-operatordeployments.oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage
# oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale deployment ocs-operator --replicas=1 -n openshift-storage
# oc scale deployment ocs-operator --replicas=1 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the CephFS status.
ceph fs status
# ceph fs statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow The status should be active.
If the application pods attached to the deployments which were using the CephFS Persistent Volume Claims (PVCs) get stuck in
CreateContainerErrorstate post restoring the CephFS, restart the application pods.oc -n <namespace> delete pods <cephfs-app-pod>
# oc -n <namespace> delete pods <cephfs-app-pod>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <namespace>- Is the project namespace
<cephfs-app-pod>- Is the name of the CephFS application pod
- If new CephFS or RBD PVCs are not getting bound, restart all the pods related to Ceph CSI.
Restoring the Multicloud Object Gateway
Post restoring the MON pods, check the Multicloud Object Gateway (MCG) status, it should be active, and the backingstore and bucketclass should be in Ready state. If not, restart all the MCG related pods and check the MCG status to confirm that the MCG is back up and running.
Check the MCG status.
noobaa status -n openshift-storage
noobaa status -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restart all the pods related to the MCG.
oc delete pods <noobaa-operator> -n openshift-storage
# oc delete pods <noobaa-operator> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-core> -n openshift-storage
# oc delete pods <noobaa-core> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-endpoint> -n openshift-storage
# oc delete pods <noobaa-endpoint> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-db> -n openshift-storage
# oc delete pods <noobaa-db> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow <noobaa-operator>- Is the name of the MCG operator
<noobaa-core>- Is the name of the MCG core pod
<noobaa-endpoint>- Is the name of the MCG endpoint
<noobaa-db>- Is the name of the MCG db pod
If the RADOS Object Gateway (RGW) is configured, restart the pod.
oc delete pods <rgw-pod> -n openshift-storage
# oc delete pods <rgw-pod> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow <rgw-pod>- Is the name of the RGW pod