Este contenido no está disponible en el idioma seleccionado.
Chapter 6. Troubleshooting Ceph MDSs
As a storage administrator, you can troubleshoot the most common issues that can occur when using the Ceph Metadata Server (MDS). Some of the common errors that you might encounter:
- An MDS node failure requiring a new MDS deployment.
- An MDS node issue requiring redeployment of an MDS node.
6.1. Redeploying a Ceph MDS Copiar enlaceEnlace copiado en el portapapeles!
Ceph Metadata Server (MDS) daemons are necessary for deploying a Ceph File System. If an MDS node in your cluster fails, you can redeploy a Ceph Metadata Server by removing an MDS server and adding a new or existing server. You can use the command-line interface or Ansible playbook to add or remove an MDS server.
6.1.1. Prerequisites Copiar enlaceEnlace copiado en el portapapeles!
- A running Red Hat Ceph Storage cluster.
6.1.2. Removing a Ceph MDS using Ansible Copiar enlaceEnlace copiado en el portapapeles!
To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds playbook.
If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.
Prerequisites
- At least one MDS node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Rootorsudoaccess to an Ansible administration node.
Procedure
- Log in to the Ansible administration node.
Change to the
/usr/share/ceph-ansibledirectory:Example
[ansible@admin ~]$ cd /usr/share/ceph-ansibleRun the Ansible
shrink-mds.ymlplaybook, and when prompted, typeyesto confirm shrinking the cluster:Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hostsReplace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.
Example
[ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hostsAs
rootor withsudoaccess, open and edit the/usr/share/ceph-ansible/hostsinventory file and remove the MDS node under the[mdss]section:Syntax
[mdss] MDS_NODE_NAME MDS_NODE_NAMEExample
[mdss] node01 node03In this example,
node02was removed from the[mdss]list.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dumpExample
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
6.1.3. Removing a Ceph MDS using the command-line interface Copiar enlaceEnlace copiado en el portapapeles!
You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.
If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.
Prerequisites
-
The
ceph-commonpackage is installed. - A running Red Hat Ceph Storage cluster.
-
Rootorsudoaccess to the MDS nodes.
Procedure
- Log into the Ceph MDS node that you want to remove the MDS daemon from.
Stop the Ceph MDS service:
Syntax
sudo systemctl stop ceph-mds@HOST_NAMEReplace HOST_NAME with the short name of the host where the daemon is running.
Example
[admin@node02 ~]$ sudo systemctl stop ceph-mds@node02Disable the MDS service if you are not redeploying MDS to this node:
Syntax
sudo systemctl disable ceph-mds@HOST_NAMEReplace HOST_NAME with the short name of the host to disable the daemon.
Example
[admin@node02 ~]$ sudo systemctl disable ceph-mds@node02Remove the
/var/lib/ceph/mds/ceph-MDS_IDdirectory on the MDS node:Syntax
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_IDReplace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.
Example
[admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dumpExample
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]
6.1.4. Adding a Ceph MDS using Ansible Copiar enlaceEnlace copiado en el portapapeles!
Use the Ansible playbook to add a Ceph Metadata Server (MDS).
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Rootorsudoaccess to an Ansible administration node. - New or existing servers that can be provisioned as MDS nodes.
Procedure
- Log in to the Ansible administration node
Change to the
/usr/share/ceph-ansibledirectory:Example
[ansible@admin ~]$ cd /usr/share/ceph-ansibleAs
rootor withsudoaccess, open and edit the/usr/share/ceph-ansible/hostsinventory file and add the MDS node under the[mdss]section:Syntax
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAMEReplace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.
Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the
[osds]and[mdss]sections.Example
[mdss] node01 node03As the
ansibleuser, run the Ansible playbook to provision the MDS node:Bare-metal deployments:
[ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hostsContainer deployments:
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hostsAfter the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dumpExample
[ansible@admin ceph-ansible]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]Alternatively, you can use the
ceph mds statcommand to check if the MDS is in an active state:Syntax
ceph mds statExample
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
6.1.5. Adding a Ceph MDS using the command-line interface Copiar enlaceEnlace copiado en el portapapeles!
You can manually add a Ceph Metadata Server (MDS) using the command-line interface.
Prerequisites
-
The
ceph-commonpackage is installed. - A running Red Hat Ceph Storage cluster.
-
Rootorsudoaccess to the MDS nodes. - New or existing servers that can be provisioned as MDS nodes.
Procedure
Add a new MDS node by logging into the node and creating an MDS mount point:
Syntax
sudo mkdir /var/lib/ceph/mds/ceph-MDS_IDReplace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.
Example
[admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03If this is a new MDS node, create the authentication key if you are using Cephx authentication:
Syntax
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyringReplace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.
Example
[admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyringNoteCephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.
Start the MDS daemon:
Syntax
sudo systemctl start ceph-mds@HOST_NAMEReplace HOST_NAME with the short name of the host to start the daemon.
Example
[admin@node03 ~]$ sudo systemctl start ceph-mds@node03Enable the MDS service:
Syntax
systemctl enable ceph-mds@HOST_NAMEReplace HOST_NAME with the short name of the host to enable the service.
Example
[admin@node03 ~]$ sudo systemctl enable ceph-mds@node03
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dumpExample
[admin@mon]$ ceph fs dump [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]] Standby daemons: [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]Alternatively, you can use the
ceph mds statcommand to check if the MDS is in an active state:Syntax
ceph mds statExample
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby