Chapter 6. Troubleshooting Ceph MDSs
As a storage administrator, you can troubleshoot the most common issues that can occur when using the Ceph Metadata Server (MDS). Some of the common errors that you might encounter:
- An MDS node failure requiring a new MDS deployment.
- An MDS node issue requiring redeployment of an MDS node.
6.1. Redeploying a Ceph MDS
Ceph Metadata Server (MDS) daemons are necessary for deploying a Ceph File System. If an MDS node in your cluster fails, you can redeploy a Ceph Metadata Server by removing an MDS server and adding a new or existing server. You can use the command-line interface or Ansible playbook to add or remove an MDS server.
6.1.1. Prerequisites
- A running Red Hat Ceph Storage cluster.
6.1.2. Removing a Ceph MDS using Ansible
To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds
playbook.
If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.
Prerequisites
- At least one MDS node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node.
Procedure
- Log in to the Ansible administration node.
Change to the
/usr/share/ceph-ansible
directory:Example
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Run the Ansible
shrink-mds.yml
playbook, and when prompted, typeyes
to confirm shrinking the cluster:Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts
Replace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.
Example
[ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts
As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and remove the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME MDS_NODE_NAME
Example
[mdss] node01 node03
In this example,
node02
was removed from the[mdss]
list.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
Example
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
- See the Adding a Ceph MDS using Ansible section in the Red Hat Ceph Storage Troubleshooting Guide for more details on adding an MDS using Ansible.
6.1.3. Removing a Ceph MDS using the command-line interface
You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.
If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes.
Procedure
- Log into the Ceph MDS node that you want to remove the MDS daemon from.
Stop the Ceph MDS service:
Syntax
sudo systemctl stop ceph-mds@HOST_NAME
Replace HOST_NAME with the short name of the host where the daemon is running.
Example
[admin@node02 ~]$ sudo systemctl stop ceph-mds@node02
Disable the MDS service if you are not redeploying MDS to this node:
Syntax
sudo systemctl disable ceph-mds@HOST_NAME
Replace HOST_NAME with the short name of the host to disable the daemon.
Example
[admin@node02 ~]$ sudo systemctl disable ceph-mds@node02
Remove the
/var/lib/ceph/mds/ceph-MDS_ID
directory on the MDS node:Syntax
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID
Replace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.
Example
[admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
Example
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
- See the Adding a Ceph MDS using the command line interface section in the Red Hat Ceph Storage Troubleshooting Guide for more details on adding an MDS using the command line interface.
6.1.4. Adding a Ceph MDS using Ansible
Use the Ansible playbook to add a Ceph Metadata Server (MDS).
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node. - New or existing servers that can be provisioned as MDS nodes.
Procedure
- Log in to the Ansible administration node
Change to the
/usr/share/ceph-ansible
directory:Example
[ansible@admin ~]$ cd /usr/share/ceph-ansible
As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and add the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAME
Replace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.
Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the
[osds]
and[mdss]
sections.Example
[mdss] node01 node03
As the
ansible
user, run the Ansible playbook to provision the MDS node:Bare-metal deployments:
[ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hosts
Container deployments:
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hosts
After the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
Example
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
Example
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
- See the Removing a Ceph MDS using Ansible section in the Red Hat Ceph Storage Troubleshooting Guide for more details on removing an MDS using Ansible.
6.1.5. Adding a Ceph MDS using the command-line interface
You can manually add a Ceph Metadata Server (MDS) using the command-line interface.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes. - New or existing servers that can be provisioned as MDS nodes.
Procedure
Add a new MDS node by logging into the node and creating an MDS mount point:
Syntax
sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID
Replace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.
Example
[admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03
If this is a new MDS node, create the authentication key if you are using Cephx authentication:
Syntax
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring
Replace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.
Example
[admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring
NoteCephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.
Start the MDS daemon:
Syntax
sudo systemctl start ceph-mds@HOST_NAME
Replace HOST_NAME with the short name of the host to start the daemon.
Example
[admin@node03 ~]$ sudo systemctl start ceph-mds@node03
Enable the MDS service:
Syntax
systemctl enable ceph-mds@HOST_NAME
Replace HOST_NAME with the short name of the host to enable the service.
Example
[admin@node03 ~]$ sudo systemctl enable ceph-mds@node03
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
Example
[admin@mon]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
Example
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
- For more information on Cephx authentication, see the Red Hat Ceph Storage Configuration Guide.
- See the Removing a Ceph MDS using the command line interface section in the Red Hat Ceph Storage Troubleshooting Guide for more details on removing an MDS using the command line interface.