OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
Chapter 12. Restoring ceph-monitor quorum in OpenShift Data Foundation
In some circumstances, the ceph-mons
might lose quorum. If the mons
cannot form quorum again, there is a manual procedure to get the quorum going again. The only requirement is that, at least one mon
must be healthy. The following steps removes the unhealthy mons
from quorum and enables you to form a quorum again with a single mon
, then bring the quorum back to the original size.
For example, if you have three mons
and lose quorum, you need to remove the two bad mons
from quorum, notify the good mon
that it is the only mon
in quorum, and then restart the good mon
.
Procedure
Stop the
rook-ceph-operator
so that themons
are not failed over when you are modifying themonmap
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0
Inject a new
monmap
.WarningYou must inject the
monmap
very carefully. If run incorrectly, your cluster could be permanently destroyed. The Cephmonmap
keeps track of themon
quorum. Themonmap
is updated to only contain the healthy mon. In this example, the healthy mon isrook-ceph-mon-b
, while the unhealthymons
arerook-ceph-mon-a
androok-ceph-mon-c
.Take a backup of the current
rook-ceph-mon-b
Deployment:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml
# oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml
Open the YAML file and copy the command and arguments from the
mon
container (see containers list in the following example). This is needed for themonmap
changes.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [...] containers: - args: - --fsid=41a537f2-f282-428e-989f-a9e07be32e47 - --keyring=/etc/ceph/keyring-store/keyring - --log-to-stderr=true - --err-to-stderr=true - --mon-cluster-log-to-stderr=true - '--log-stderr-prefix=debug ' - --default-log-to-file=false - --default-mon-cluster-log-to-file=false - --mon-host=$(ROOK_CEPH_MON_HOST) - --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) - --id=b - --setuser=ceph - --setgroup=ceph - --foreground - --public-addr=10.100.13.242 - --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db - --public-bind-addr=$(ROOK_POD_IP) command: - ceph-mon [...]
[...] containers: - args: - --fsid=41a537f2-f282-428e-989f-a9e07be32e47 - --keyring=/etc/ceph/keyring-store/keyring - --log-to-stderr=true - --err-to-stderr=true - --mon-cluster-log-to-stderr=true - '--log-stderr-prefix=debug ' - --default-log-to-file=false - --default-mon-cluster-log-to-file=false - --mon-host=$(ROOK_CEPH_MON_HOST) - --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) - --id=b - --setuser=ceph - --setgroup=ceph - --foreground - --public-addr=10.100.13.242 - --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db - --public-bind-addr=$(ROOK_POD_IP) command: - ceph-mon [...]
Cleanup the copied
command
andargs
fields to form a pastable command as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP
# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP
NoteMake sure to remove the single quotes around the
--log-stderr-prefix
flag and the parenthesis around the variables being passedROOK_CEPH_MON_HOST
,ROOK_CEPH_MON_INITIAL_MEMBERS
andROOK_POD_IP
).Patch the
rook-ceph-mon-b
Deployment to stop the working of thismon
without deleting themon
pod.Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'
# oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' # oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'
Perform the following steps on the
mon-b
pod:Connect to the pod of a healthy
mon
and run the following commands:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage exec -it <mon-pod> bash
# oc -n openshift-storage exec -it <mon-pod> bash
Set the variable.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow monmap_path=/tmp/monmap
# monmap_path=/tmp/monmap
Extract the
monmap
to a file, by pasting the cephmon
command from the goodmon
deployment and adding the--extract-monmap=${monmap_path}
flag.Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --extract-monmap=${monmap_path}
# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --extract-monmap=${monmap_path}
Review the contents of the
monmap
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow monmaptool --print /tmp/monmap
# monmaptool --print /tmp/monmap
Remove the bad
mons
from themonmap
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow monmaptool ${monmap_path} --rm <bad_mon>
# monmaptool ${monmap_path} --rm <bad_mon>
In this example we remove
mon0
andmon2
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow monmaptool ${monmap_path} --rm a monmaptool ${monmap_path} --rm c
# monmaptool ${monmap_path} --rm a # monmaptool ${monmap_path} --rm c
Inject the modified
monmap
into the goodmon
, by pasting the cephmon
command and adding the--inject-monmap=${monmap_path}
flag as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --inject-monmap=${monmap_path}
# ceph-mon \ --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \ --keyring=/etc/ceph/keyring-store/keyring \ --log-to-stderr=true \ --err-to-stderr=true \ --mon-cluster-log-to-stderr=true \ --log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$ROOK_CEPH_MON_HOST \ --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \ --id=b \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=10.100.13.242 \ --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \ --public-bind-addr=$ROOK_POD_IP \ --inject-monmap=${monmap_path}
- Exit the shell to continue.
Edit the Rook
configmaps
.Edit the
configmap
that the operator uses to track themons
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage edit configmap rook-ceph-mon-endpoints
# oc -n openshift-storage edit configmap rook-ceph-mon-endpoints
Verify that in the data element you see three
mons
such as the following (or more depending on yourmoncount
):Copy to Clipboard Copied! Toggle word wrap Toggle overflow data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789
data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789
Delete the bad
mons
from the list to end up with a single goodmon
. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow data: b=10.100.13.242:6789
data: b=10.100.13.242:6789
- Save the file and exit.
Now, you need to adapt a
Secret
which is used for themons
and other components.Set a value for the variable
good_mon_id
.For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow good_mon_id=b
# good_mon_id=b
You can use the
oc patch
command to patch therook-ceph-config
secret and update the two key/value pairsmon_host
andmon_initial_members
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'
# mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') # oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'
NoteIf you are using
hostNetwork: true
, you need to replace themon_host
var with the node IP themon
is pinned to (nodeSelector
). This is because there is norook-ceph-mon-*
service created in that “mode”.
Restart the
mon
.You need to restart the good
mon
pod with the originalceph-mon
command to pick up the changes.Use the
oc replace
command on the backup of themon
deployment YAML file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc replace --force -f rook-ceph-mon-b-deployment.yaml
# oc replace --force -f rook-ceph-mon-b-deployment.yaml
NoteOption
--force
deletes the deployment and creates a new one.Verify the status of the cluster.
The status should show one
mon
in quorum. If the status looks good, your cluster should be healthy again.
Delete the two mon deployments that are no longer expected to be in quorum.
For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete deploy <rook-ceph-mon-1> oc delete deploy <rook-ceph-mon-2>
# oc delete deploy <rook-ceph-mon-1> # oc delete deploy <rook-ceph-mon-2>
In this example the deployments to be deleted are
rook-ceph-mon-a
androok-ceph-mon-c
.Restart the operator.
Start the rook operator again to resume monitoring the health of the cluster.
NoteIt is safe to ignore the errors that a number of resources already exist.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
The operator automatically adds more
mons
to increase the quorum size again depending on themon
count.