Chapter 6. Troubleshooting Ceph MDSs
As a storage administrator, you can troubleshoot the most common issues that can occur when using the Ceph Metadata Server (MDS). Some of the common errors that you might encounter:
- An MDS node failure requiring a new MDS deployment.
- An MDS node issue requiring redeployment of an MDS node.
6.1. Redeploying a Ceph MDS Copy linkLink copied to clipboard!
Ceph Metadata Server (MDS) daemons are necessary for deploying a Ceph File System. If an MDS node in your cluster fails, you can redeploy a Ceph Metadata Server by removing an MDS server and adding a new or existing server. You can use the command-line interface or Ansible playbook to add or remove an MDS server.
6.1.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
6.1.2. Removing a Ceph MDS using Ansible Copy linkLink copied to clipboard!
To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds
playbook.
If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.
Prerequisites
- At least one MDS node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node.
Procedure
- Log in to the Ansible administration node.
Change to the
/usr/share/ceph-ansible
directory:Example
cd /usr/share/ceph-ansible
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible
shrink-mds.yml
playbook, and when prompted, typeyes
to confirm shrinking the cluster:Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.
Example
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts
[ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and remove the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME MDS_NODE_NAME
[mdss] MDS_NODE_NAME MDS_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[mdss] node01 node03
[mdss] node01 node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
node02
was removed from the[mdss]
list.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.1.3. Removing a Ceph MDS using the command-line interface Copy linkLink copied to clipboard!
You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.
If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes.
Procedure
- Log into the Ceph MDS node that you want to remove the MDS daemon from.
Stop the Ceph MDS service:
Syntax
sudo systemctl stop ceph-mds@HOST_NAME
sudo systemctl stop ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host where the daemon is running.
Example
sudo systemctl stop ceph-mds@node02
[admin@node02 ~]$ sudo systemctl stop ceph-mds@node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Disable the MDS service if you are not redeploying MDS to this node:
Syntax
sudo systemctl disable ceph-mds@HOST_NAME
sudo systemctl disable ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to disable the daemon.
Example
sudo systemctl disable ceph-mds@node02
[admin@node02 ~]$ sudo systemctl disable ceph-mds@node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
/var/lib/ceph/mds/ceph-MDS_ID
directory on the MDS node:Syntax
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.
Example
sudo rm -fr /var/lib/ceph/mds/ceph-node02
[admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.1.4. Adding a Ceph MDS using Ansible Copy linkLink copied to clipboard!
Use the Ansible playbook to add a Ceph Metadata Server (MDS).
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node. - New or existing servers that can be provisioned as MDS nodes.
Procedure
- Log in to the Ansible administration node
Change to the
/usr/share/ceph-ansible
directory:Example
cd /usr/share/ceph-ansible
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and add the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAME
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.
Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the
[osds]
and[mdss]
sections.Example
[mdss] node01 node03
[mdss] node01 node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As the
ansible
user, run the Ansible playbook to provision the MDS node:Bare-metal deployments:
ansible-playbook site.yml --limit mdss -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
ansible-playbook site-container.yml --limit mdss -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
ceph mds stat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mds stat
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.1.5. Adding a Ceph MDS using the command-line interface Copy linkLink copied to clipboard!
You can manually add a Ceph Metadata Server (MDS) using the command-line interface.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes. - New or existing servers that can be provisioned as MDS nodes.
Procedure
Add a new MDS node by logging into the node and creating an MDS mount point:
Syntax
sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID
sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.
Example
sudo mkdir /var/lib/ceph/mds/ceph-node03
[admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If this is a new MDS node, create the authentication key if you are using Cephx authentication:
Syntax
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.
Example
sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring
[admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteCephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.
Start the MDS daemon:
Syntax
sudo systemctl start ceph-mds@HOST_NAME
sudo systemctl start ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to start the daemon.
Example
sudo systemctl start ceph-mds@node03
[admin@node03 ~]$ sudo systemctl start ceph-mds@node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the MDS service:
Syntax
systemctl enable ceph-mds@HOST_NAME
systemctl enable ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to enable the service.
Example
sudo systemctl enable ceph-mds@node03
[admin@node03 ~]$ sudo systemctl enable ceph-mds@node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
ceph mds stat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mds stat
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Copy to Clipboard Copied! Toggle word wrap Toggle overflow