Chapter 10. Restoring the monitor pods in OpenShift Container Storage
Restore the monitor pods if all three of them go down, and when OpenShift Container Storage is not able to recover the monitor pods automatically.
Procedure
Scale down the
rook-ceph-operator
andocs operator
deployments.# oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
# oc scale deployment ocs-operator --replicas=0 -n openshift-storage
Create a backup of all deployments in the
openshift-storage
namespace.# mkdir backup
# cd backup
# oc project openshift-storage
# for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done
Patch the OSD deployments to remove the
livenessProbe
parameter, and run it with the command parameter assleep
.# for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
Retrieve the
monstore
cluster map from all the OSDs.Create the
recover_mon.sh
script.#!/bin/bash ms=/tmp/monstore rm -rf $ms mkdir $ms for osd_pod in $(oc get po -l app=rook-ceph-osd -oname -n openshift-storage); do echo "Starting with pod: $osd_pod" podname=$(echo $osd_pod|sed 's/pod\///g') oc exec $osd_pod -- rm -rf $ms oc cp $ms $podname:$ms rm -rf $ms mkdir $ms echo "pod in loop: $osd_pod ; done deleting local dirs" oc exec $osd_pod -- ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-$(oc get $osd_pod -ojsonpath='{ .metadata.labels.ceph_daemon_id }') --op update-mon-db --no-mon-config --mon-store-path $ms echo "Done with COT on pod: $osd_pod" oc cp $podname:$ms $ms echo "Finished pulling COT data from pod: $osd_pod" done
Run the
recover_mon.sh
script.# chmod +x recover_mon.sh
# ./recover_mon.sh
Patch the MON deployments, and run it with the command parameter as
sleep
.Edit the MON deployments.
# for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done
Patch the MON deployments to increase the
initialDelaySeconds
.# oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
Copy the previously retrieved
monstore
to the mon-a pod.# oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
Navigate into the MON pod and change the ownership of the retrieved
monstore
.# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# chown -R ceph:ceph /tmp/monstore
Copy the keyring template file before rebuilding the
mon db
.# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# cp /etc/ceph/keyring-store/keyring /tmp/keyring
# cat /tmp/keyring [mon.] key = AQCleqldWqm5IhAAgZQbEzoShkZV42RiQVffnA== caps mon = "allow *" [client.admin] key = AQCmAKld8J05KxAArOWeRAw63gAwwZO5o75ZNQ== auid = 0 caps mds = "allow *" caps mgr = "allow *" caps mon = "allow *" caps osd = "allow *”
Identify the keyring of all the other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI provisioners) from its respective secrets.
# oc get secret rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-keyring -ojson | jq .data.keyring | xargs echo | base64 -d [mds.ocs-storagecluster-cephfilesystem-a] key = AQB3r8VgAtr6OhAAVhhXpNKqRTuEVdRoxG4uRA== caps mon = "allow profile mds" caps osd = "allow *" caps mds = "allow"
Example keyring file,
/etc/ceph/ceph.client.admin.keyring
:[mon.] key = AQDxTF1hNgLTNxAAi51cCojs01b4I5E6v2H8Uw== caps mon = "allow " [client.admin] key = AQDxTF1hpzguOxAA0sS8nN4udoO35OEbt3bqMQ== caps mds = "allow " caps mgr = "allow *" caps mon = "allow *" caps osd = "allow *" [mds.ocs-storagecluster-cephfilesystem-a] key = AQCKTV1horgjARAA8aF/BDh/4+eG4RCNBCl+aw== caps mds = "allow" caps mon = "allow profile mds" caps osd = "allow *" [mds.ocs-storagecluster-cephfilesystem-b] key = AQCKTV1hN4gKLBAA5emIVq3ncV7AMEM1c1RmGA== caps mds = "allow" caps mon = "allow profile mds" caps osd = "allow *" [client.rgw.ocs.storagecluster.cephobjectstore.a] key = AQCOkdBixmpiAxAA4X7zjn6SGTI9c1MBflszYA== caps mon = "allow rw" caps osd = "allow rwx" [mgr.a] key = AQBOTV1hGYOEORAA87471+eIZLZtptfkcHvTRg== caps mds = "allow *" caps mon = "allow profile mgr" caps osd = "allow *" [client.crash] key = AQBOTV1htO1aGRAAe2MPYcGdiAT+Oo4CNPSF1g== caps mgr = "allow rw" caps mon = "allow profile crash" [client.csi-cephfs-node] key = AQBOTV1hiAtuBBAAaPPBVgh1AqZJlDeHWdoFLw== caps mds = "allow rw" caps mgr = "allow rw" caps mon = "allow r" caps osd = "allow rw tag cephfs *=" [client.csi-cephfs-provisioner] key = AQBNTV1hHu6wMBAAzNXZv36aZJuE1iz7S7GfeQ== caps mgr = "allow rw" caps mon = "allow r" caps osd = "allow rw tag cephfs metadata=" [client.csi-rbd-node] key = AQBNTV1h+LnkIRAAWnpIN9bUAmSHOvJ0EJXHRw== caps mgr = "allow rw" caps mon = "profile rbd" caps osd = "profile rbd" [client.csi-rbd-provisioner] key = AQBNTV1hMNcsExAAvA3gHB2qaY33LOdWCvHG/A== caps mgr = "allow rw" caps mon = "profile rbd" caps osd = "profile rbd"
Important-
For
client.csi
related keyring, refer to the previous keyring file output and add the defaultcaps
after fetching the key from its respective OpenShift Container Storage secret. - OSD keyring is added automatically post recovery.
-
For
Navigate into the mon-a pod, and verify that the
monstore
hasmonmap
.Navigate into the mon-a pod.
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
Verify that the
monstore
hasmonmap
.# ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
# monmaptool /tmp/monmap --print
Optional: If the
monmap
is missing then create a newmonmap
.# monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
<mon-a-id>
- Is the ID of the mon-a pod.
<mon-a-ip>
- Is the IP address of the mon-a pod.
<mon-b-id>
- Is the ID of the mon-b pod.
<mon-b-ip>
- Is the IP address of the mon-b pod.
<mon-c-id>
- Is the ID of the mon-c pod.
<mon-c-ip>
- Is the IP address of the mon-c pod.
<fsid>
- Is the file system ID.
Verify the
monmap
.# monmaptool /root/monmap --print
Import the
monmap
.ImportantUse the previously created keyring file.
# ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
# chown -R ceph:ceph /tmp/monstore
Create a backup of the old
store.db
file.# mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
Copy the rebuild
store.db
file to themonstore
directory.# mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
After rebuilding the
monstore
directory, copy thestore.db
file from local to the rest of the MON pods.# oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
# oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
<id>
- Is the ID of the MON pod
Navigate into the rest of the MON pods and change the ownership of the copied
monstore
.# oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
<id>
- Is the ID of the MON pod
Revert the patched changes.
For MON deployments:
# oc replace --force -f <mon-deployment.yaml>
<mon-deployment.yaml>
- Is the MON deployment yaml file
For OSD deployments:
# oc replace --force -f <osd-deployment.yaml>
<osd-deployment.yaml>
- Is the OSD deployment yaml file
For MGR deployments:
# oc replace --force -f <mgr-deployment.yaml>
<mgr-deployment.yaml>
Is the MGR deployment yaml file
ImportantEnsure that the MON, MGR and OSD pods are up and running.
Scale up the
rook-ceph-operator
andocs-operator
deployments.# oc -n openshift-storage scale deployment ocs-operator --replicas=1
Verification steps
Check the Ceph status to confirm that CephFS is running.
# ceph -s
Example output:
cluster: id: f111402f-84d1-4e06-9fdb-c27607676e55 health: HEALTH_ERR 1 filesystem is offline 1 filesystem is online with fewer MDS than max_mds 3 daemons have recently crashed services: mon: 3 daemons, quorum b,c,a (age 15m) mgr: a(active, since 14m) mds: ocs-storagecluster-cephfilesystem:0 osd: 3 osds: 3 up (since 15m), 3 in (since 2h) data: pools: 3 pools, 96 pgs objects: 500 objects, 1.1 GiB usage: 5.5 GiB used, 295 GiB / 300 GiB avail pgs: 96 active+clean
ImportantIf the filesystem is offline or MDS service is missing, you need to restore the CephFS. For more information, see Section 10.1, “Restoring the CephFS”.
Check the Multicloud Object Gateway (MCG) status. It should be active, and the backingstore and bucketclass should be in
Ready
state.noobaa status -n openshift-storage
ImportantIf the MCG is not in the active state, and the backingstore and bucketclass not in the
Ready
state, you need to restart all the MCG related pods. For more information, see Section 10.2, “Restoring the Multicloud Object Gateway”.
10.1. Restoring the CephFS
If the filesystem is offline or MDS service is missing you need to restore the CephFS.
Procedure
Scale down the
rook-ceph-operator
andocs operator
deployments.# oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
# oc scale deployment ocs-operator --replicas=0 -n openshift-storage
Patch the MDS deployments to remove the
livenessProbe
parameter and run it with the command parameter assleep
.# for i in $(oc get deployment -l app=rook-ceph-mds -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mds", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done
Recover the CephFS.
# ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it
If the
reset
command fails, force create the default filesystem with the data and metadata pools, and then reset it.NoteThe
reset
command might fail if thecephfilesystem
is missing.# ceph fs new ocs-storagecluster-cephfilesystem ocs-storagecluster-cephfilesystem-metadata ocs-storagecluster-cephfilesystem-data0 --force
# ceph fs reset ocs-storagecluster-cephfilesystem --yes-i-really-mean-it
Replace the MDS deployments.
# oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-a.yaml
# oc replace --force -f oc_get_deployment.rook-ceph-mds-ocs-storagecluster-cephfilesystem-b.yaml
Scale up the
rook-ceph-operator
andocs-operator
deployments.# oc scale deployment ocs-operator --replicas=1 -n openshift-storage
Check the CephFS status.
# ceph fs status
The status should be active.
If the application pods attached to the deployments which were using the CephFS Persistent Volume Claims (PVCs) get stuck in
CreateContainerError
state post restoring the CephFS, restart the application pods.# oc -n <namespace> delete pods <cephfs-app-pod>
<namespace>
- Is the project namespace
<cephfs-app-pod>
- Is the name of the CephFS application pod
- If new CephFS or RBD PVCs are not getting bound, restart all the pods related to Ceph CSI.
10.2. Restoring the Multicloud Object Gateway
If the Multicloud Object Gateway (MCG) is not in the active state, and the backingstore and bucketclass is not in the Ready
state, you need to restart all the MCG related pods, and check the MCG status to confirm that the MCG is back up and running.
Procedure
Restart all the pods related to the MCG.
# oc delete pods <noobaa-operator> -n openshift-storage
# oc delete pods <noobaa-core> -n openshift-storage
# oc delete pods <noobaa-endpoint> -n openshift-storage
# oc delete pods <noobaa-db> -n openshift-storage
<noobaa-operator>
- Is the name of the MCG operator
<noobaa-core>
- Is the name of the MCG core pod
<noobaa-endpoint>
- Is the name of the MCG endpoint
<noobaa-db>
- Is the name of the MCG db pod
If the RADOS Object Gateway (RGW) is configured, restart the pod.
# oc delete pods <rgw-pod> -n openshift-storage
<rgw-pod>
- Is the name of the RGW pod