Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 13. Restoring ceph-monitor quorum in OpenShift Data Foundation
In some circumstances, the ceph-mons might lose quorum. If the mons cannot form quorum again, there is a manual procedure to get the quorum going again. The only requirement is that at least one mon must be healthy. The following steps removes the unhealthy mons from quorum and enables you to form a quorum again with a single mon, then bring the quorum back to the original size.
For example, if you have three mons and lose quorum, you need to remove the two bad mons from quorum, notify the good mon that it is the only mon in quorum, and then restart the good mon.
Procedure
Stop the
rook-ceph-operatorso that themonsare not failed over when you are modifying themonmap.oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inject a new
monmap.WarningYou must inject the
monmapvery carefully. If run incorrectly, your cluster could be permanently destroyed. The Cephmonmapkeeps track of themonquorum. Themonmapis updated to only contain the healthy mon. In this example, the healthy mon isrook-ceph-mon-b, while the unhealthymonsarerook-ceph-mon-aandrook-ceph-mon-c.Take a backup of the current
rook-ceph-mon-bDeployment:oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml
# oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Open the YAML file and copy the command and arguments from the
moncontainer (see containers list in the following example). This is needed for themonmapchanges.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Cleanup the copied
commandandargsfields to form a pastable command as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteMake sure to remove the single quotes around the
--log-stderr-prefixflag and the parenthesis around the variables being passedROOK_CEPH_MON_HOST,ROOK_CEPH_MON_INITIAL_MEMBERSandROOK_POD_IP).Patch the
rook-ceph-mon-bDeployment to stop the working of thismonwithout deleting themonpod.oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'# oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' # oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Perform the following steps on the
mon-bpod:Connect to the pod of a healthy
monand run the following commands:oc -n openshift-storage exec -it <mon-pod> bash
# oc -n openshift-storage exec -it <mon-pod> bashCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the variable.
monmap_path=/tmp/monmap
# monmap_path=/tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the
monmapto a file, by pasting the cephmoncommand from the goodmondeployment and adding the--extract-monmap=${monmap_path}flag.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review the contents of the
monmap.monmaptool --print /tmp/monmap
# monmaptool --print /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the bad
monsfrom themonmap.monmaptool ${monmap_path} --rm <bad_mon># monmaptool ${monmap_path} --rm <bad_mon>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example we remove
mon0andmon2:monmaptool ${monmap_path} --rm a monmaptool ${monmap_path} --rm c# monmaptool ${monmap_path} --rm a # monmaptool ${monmap_path} --rm cCopy to Clipboard Copied! Toggle word wrap Toggle overflow Inject the modified
monmapinto the goodmon, by pasting the cephmoncommand and adding the--inject-monmap=${monmap_path}flag as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Exit the shell to continue.
Edit the Rook
configmaps.Edit the
configmapthat the operator uses to track themons.oc -n openshift-storage edit configmap rook-ceph-mon-endpoints
# oc -n openshift-storage edit configmap rook-ceph-mon-endpointsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that in the data element you see three
monssuch as the following (or more depending on yourmoncount):data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789
data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the bad
monsfrom the list to end up with a single goodmon. For example:data: b=10.100.13.242:6789
data: b=10.100.13.242:6789Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the file and exit.
Now, you need to adapt a
Secretwhich is used for themonsand other components.Set a value for the variable
good_mon_id.For example:
good_mon_id=b
# good_mon_id=bCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can use the
oc patchcommand to patch therook-ceph-configsecret and update the two key/value pairsmon_hostandmon_initial_members.mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'# mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') # oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using
hostNetwork: true, you need to replace themon_hostvar with the node IP themonis pinned to (nodeSelector). This is because there is norook-ceph-mon-*service created in that “mode”.
Restart the
mon.You need to restart the good
monpod with the originalceph-moncommand to pick up the changes.Use the
oc replacecommand on the backup of themondeployment YAML file:oc replace --force -f rook-ceph-mon-b-deployment.yaml
# oc replace --force -f rook-ceph-mon-b-deployment.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteOption
--forcedeletes the deployment and creates a new one.Verify the status of the cluster.
The status should show one
monin quorum. If the status looks good, your cluster should be healthy again.
Delete the two mon deployments that are no longer expected to be in quorum.
For example:
oc delete deploy <rook-ceph-mon-1> oc delete deploy <rook-ceph-mon-2>
# oc delete deploy <rook-ceph-mon-1> # oc delete deploy <rook-ceph-mon-2>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example the deployments to be deleted are
rook-ceph-mon-aandrook-ceph-mon-c.Restart the operator.
Start the rook operator again to resume monitoring the health of the cluster.
NoteIt is safe to ignore the errors that a number of resources already exist.
oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow The operator automatically adds more
monsto increase the quorum size again depending on themoncount.