Operations Guide
Operational tasks for Red Hat Ceph Storage
Abstract
Chapter 1. Managing the storage cluster size Copy linkLink copied to clipboard!
As a storage administrator, you can manage the storage cluster size by adding or removing Ceph Monitors or OSDs as storage capacity expands or shrinks.
If you are bootstrapping a storage cluster for the first time, see the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux or Ubuntu.
1.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
1.2. Ceph Monitors Copy linkLink copied to clipboard!
Ceph monitors are light-weight processes that maintain a master copy of the cluster map. All Ceph clients contact a Ceph monitor and retrieve the current copy of the cluster map, enabling clients to bind to a pool and read and write data.
Ceph monitors use a variation of the Paxos protocol to establish consensus about maps and other critical information across the cluster. Due to the nature of Paxos, Ceph requires a majority of monitors running to establish a quorum thus establishing consensus.
Red Hat requires at least three monitors on separate hosts to receive support for a production cluster.
Red Hat recommends deploying an odd number of monitors. An odd number of monitors has a higher resiliency to failures than an even number of monitors. For example, to maintain a quorum on a two monitor deployment, Ceph cannot tolerate any failures; with three monitors, one failure; with four monitors, one failure; with five monitors, two failures. This is why an odd number is advisable. Summarizing, Ceph needs a majority of monitors to be running and to be able to communicate with each other, two out of three, three out of four, and so on.
For an initial deployment of a multi-node Ceph storage cluster, Red Hat requires three monitors, increasing the number two at a time if a valid need for more than three monitors exists.
Since monitors are light-weight, it is possible to run them on the same host as OpenStack nodes. However, Red Hat recommends running monitors on separate hosts.
Red Hat does NOT support collocating Ceph Monitors and OSDs on the same node. Doing this can have a negative impact to storage cluster performance.
Red Hat ONLY supports collocating Ceph services in containerized environments.
When you remove monitors from a storage cluster, consider that Ceph monitors use the Paxos protocol to establish a consensus about the master storage cluster map. You must have a sufficient number of monitors to establish a quorum.
Additional Resources
- See the Red Hat Ceph Storage Supported configurations Knowledgebase article for all the supported Ceph configurations.
1.2.1. Preparing a new Ceph Monitor node Copy linkLink copied to clipboard!
When adding a new Ceph Monitor to a storage cluster, deploy them on a separate node. The node hardware must be uniform for all monitor nodes in the storage cluster.
Prerequisites
- Network connectivity.
-
Having
rootaccess to the new node. - Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
Procedure
- Add the new node to the server rack.
- Connect the new node to the network.
- Install either Red Hat Enterprise Linux 7 or Ubuntu 16.04 on the new node.
Install NTP and configure a reliable time source:
yum install ntp
[root@monitor ~]# yum install ntpCopy to Clipboard Copied! Toggle word wrap Toggle overflow If using a firewall, open TCP port 6789:
Red Hat Enterprise Linux
firewall-cmd --zone=public --add-port=6789/tcp firewall-cmd --zone=public --add-port=6789/tcp --permanent
[root@monitor ~]# firewall-cmd --zone=public --add-port=6789/tcp [root@monitor ~]# firewall-cmd --zone=public --add-port=6789/tcp --permanentCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
iptables -I INPUT 1 -i $NIC_NAME -p tcp -s $IP_ADDR/$NETMASK_PREFIX --dport 6789 -j ACCEPT
iptables -I INPUT 1 -i $NIC_NAME -p tcp -s $IP_ADDR/$NETMASK_PREFIX --dport 6789 -j ACCEPTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu example
sudo iptables -I INPUT 1 -i enp6s0 -p tcp -s 192.168.0.11/24 --dport 6789 -j ACCEPT
[user@monitor ~]$ sudo iptables -I INPUT 1 -i enp6s0 -p tcp -s 192.168.0.11/24 --dport 6789 -j ACCEPTCopy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2.2. Adding a Ceph Monitor using Ansible Copy linkLink copied to clipboard!
Red Hat recommends adding two monitors at a time to maintain an odd number of monitors. For example, if you have a three monitors in the storage cluster, Red Hat recommends expanding it to a five monitors.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
rootaccess to the new nodes.
Procedure
Add the new Ceph Monitor nodes to the
/etc/ansible/hostsAnsible inventory file, under a[mons]section:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can contact the Ceph nodes:
ansible all -m ping
# ansible all -m pingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
# cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible playbook:
ansible-playbook site.yml
$ ansible-playbook site.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow If adding new monitors to a containerized deployment of Ceph, run the
site-docker.ymlplaybook:ansible-playbook site-docker.yml
$ ansible-playbook site-docker.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow - After the Ansible playbook is finish, the new monitor nodes will be in the storage cluster.
1.2.3. Adding a Ceph Monitor using the command-line interface Copy linkLink copied to clipboard!
Red Hat recommends adding two monitors at a time to maintain an odd number of monitors. For example, if you have three monitors in the storage cluster, Red Hat recommends expanding too five monitors.
Red Hat recommends only running one Ceph monitor daemon per node.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
rootaccess to a running Ceph Monitor node and to the new monitor nodes.
Procedure
Add the Red Hat Ceph Storage 3 monitor repository.
Red Hat Enterprise Linux
subscription-manager repos --enable=rhel-7-server-rhceph-3-mon-els-rpms
[root@monitor ~]# subscription-manager repos --enable=rhel-7-server-rhceph-3-mon-els-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo bash -c 'umask 0077; echo deb https://$CUSTOMER_NAME:$CUSTOMER_PASSWORD@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'
[user@monitor ~]$ sudo bash -c 'umask 0077; echo deb https://$CUSTOMER_NAME:$CUSTOMER_PASSWORD@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' [user@monitor ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the
ceph-monpackage on the new Ceph Monitor nodes:Red Hat Enterprise Linux
yum install ceph-mon
[root@monitor ~]# yum install ceph-monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo apt-get install ceph-mon
[user@monitor ~]$ sudo apt-get install ceph-monCopy to Clipboard Copied! Toggle word wrap Toggle overflow To ensure the storage cluster identifies the monitor on start or restart, add the monitor’s IP address to the Ceph configuration file.
To add the new monitors to the
[mon]or[global]section of the Ceph configuration file on an existing monitor node in the storage cluster. Themon_hostsetting, which is a list of DNS-resolvable host names or IP addresses, separated by "," or ";" or " ". Optionally, you can also create a specific section in the Ceph configuration file for the new monitor nodes:Syntax
[mon] mon host = $MONITOR_IP:$PORT $MONITOR_IP:$PORT ... $NEW_MONITOR_IP:$PORT
[mon] mon host = $MONITOR_IP:$PORT $MONITOR_IP:$PORT ... $NEW_MONITOR_IP:$PORTCopy to Clipboard Copied! Toggle word wrap Toggle overflow or
[mon.$MONITOR_ID] host = $MONITOR_ID mon addr = $MONITOR_IP
[mon.$MONITOR_ID] host = $MONITOR_ID mon addr = $MONITOR_IPCopy to Clipboard Copied! Toggle word wrap Toggle overflow To make the monitors part of the initial quorum group, you must also add the host name to the
mon_initial_membersparameter in the[global]section of the Ceph configuration file.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantProduction storage clusters REQUIRE at least three monitors set in
mon_initial_membersandmon_hostto ensure high availability. If a storage cluster with only one initial monitor adds two more monitors, but does not add them tomon_initial_membersandmon_host, the failure of the initial monitor will cause the storage cluster to lock up. If the monitors you are adding are replacing monitors that are part ofmon_initial_membersandmon_host, the new monitors must be added tomon_initial_membersandmon_hosttoo.Copy the updated Ceph configuration file to all Ceph nodes and Ceph clients:
Syntax
scp /etc/ceph/$CLUSTER_NAME.conf $TARGET_NODE_NAME:/etc/ceph
scp /etc/ceph/$CLUSTER_NAME.conf $TARGET_NODE_NAME:/etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf node4:/etc/ceph
[root@monitor ~]# scp /etc/ceph/ceph.conf node4:/etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the monitor’s data directory on the new monitor nodes:
Syntax
mkdir /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID
mkdir /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mkdir /var/lib/ceph/mon/ceph-node4
[root@monitor ~]# mkdir /var/lib/ceph/mon/ceph-node4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a temporary directory on a running monitor node and on the new monitor nodes to keep the files needed for this procedure. This directory should be different from the monitor’s default directory created in the previous step, and can be removed after all the steps are completed:
Syntax
mkdir $TEMP_DIRECTORY
mkdir $TEMP_DIRECTORYCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mkdir /tmp/ceph
[root@monitor ~]# mkdir /tmp/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the admin key from a running monitor node to the new monitor nodes so that you can run
cephcommands:Syntax
scp /etc/ceph/$CLUSTER_NAME.client.admin.keyring $TARGET_NODE_NAME:/etc/ceph
scp /etc/ceph/$CLUSTER_NAME.client.admin.keyring $TARGET_NODE_NAME:/etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph
[root@monitor ~]# scp /etc/ceph/ceph.client.admin.keyring node4:/etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow From a running monitor node, retrieve the monitor keyring:
Syntax
ceph auth get mon. -o /$TEMP_DIRECTORY/$KEY_FILE_NAME
ceph auth get mon. -o /$TEMP_DIRECTORY/$KEY_FILE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth get mon. -o /tmp/ceph/ceph_keyring.out
[root@monitor ~]# ceph auth get mon. -o /tmp/ceph/ceph_keyring.outCopy to Clipboard Copied! Toggle word wrap Toggle overflow From a running monitor node, retrieve the monitor map:
Syntax
ceph mon getmap -o /$TEMP_DIRECTORY/$MONITOR_MAP_FILE
ceph mon getmap -o /$TEMP_DIRECTORY/$MONITOR_MAP_FILECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon getmap -o /tmp/ceph/ceph_mon_map.out
[root@monitor ~]# ceph mon getmap -o /tmp/ceph/ceph_mon_map.outCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the collected monitor data to the new monitor nodes:
Syntax
scp /tmp/ceph $TARGET_NODE_NAME:/tmp/ceph
scp /tmp/ceph $TARGET_NODE_NAME:/tmp/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /tmp/ceph node4:/tmp/ceph
[root@monitor ~]# scp /tmp/ceph node4:/tmp/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Prepare the new monitors' data directory from the data you collected earlier. You must specify the path to the monitor map to retrieve quorum information from the monitors, along with their
fsid. You must also specify a path to the monitor keyring:Syntax
ceph-mon -i $MONITOR_ID --mkfs --monmap /$TEMP_DIRECTORY/$MONITOR_MAP_FILE --keyring /$TEMP_DIRECTORY/$KEY_FILE_NAME
ceph-mon -i $MONITOR_ID --mkfs --monmap /$TEMP_DIRECTORY/$MONITOR_MAP_FILE --keyring /$TEMP_DIRECTORY/$KEY_FILE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-mon -i node4 --mkfs --monmap /tmp/ceph/ceph_mon_map.out --keyring /tmp/ceph/ceph_keyring.out
[root@monitor ~]# ceph-mon -i node4 --mkfs --monmap /tmp/ceph/ceph_mon_map.out --keyring /tmp/ceph/ceph_keyring.outCopy to Clipboard Copied! Toggle word wrap Toggle overflow For storage clusters with custom names, add the the following line to the
/etc/sysconfig/cephfile:Red Hat Enterprise Linux
echo "CLUSTER=<custom_cluster_name>" >> /etc/sysconfig/ceph
[root@monitor ~]# echo "CLUSTER=<custom_cluster_name>" >> /etc/sysconfig/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo echo "CLUSTER=<custom_cluster_name>" >> /etc/default/ceph
[user@monitor ~]$ sudo echo "CLUSTER=<custom_cluster_name>" >> /etc/default/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the owner and group permissions on the new monitor nodes:
Syntax
chown -R $OWNER:$GROUP $DIRECTORY_PATH
chown -R $OWNER:$GROUP $DIRECTORY_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
chown -R ceph:ceph /var/lib/ceph/mon chown -R ceph:ceph /var/log/ceph chown -R ceph:ceph /var/run/ceph chown -R ceph:ceph /etc/ceph
[root@monitor ~]# chown -R ceph:ceph /var/lib/ceph/mon [root@monitor ~]# chown -R ceph:ceph /var/log/ceph [root@monitor ~]# chown -R ceph:ceph /var/run/ceph [root@monitor ~]# chown -R ceph:ceph /etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enable and start the
ceph-monprocess on the new monitor nodes:Syntax
systemctl enable ceph-mon.target systemctl enable ceph-mon@$MONITOR_ID systemctl start ceph-mon@$MONITOR_ID
systemctl enable ceph-mon.target systemctl enable ceph-mon@$MONITOR_ID systemctl start ceph-mon@$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl enable ceph-mon.target systemctl enable ceph-mon@node4 systemctl start ceph-mon@node4
[root@monitor ~]# systemctl enable ceph-mon.target [root@monitor ~]# systemctl enable ceph-mon@node4 [root@monitor ~]# systemctl start ceph-mon@node4Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Enabling the Red Hat Ceph Storage Repositories section in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
1.2.4. Removing a Ceph Monitor using Ansible Copy linkLink copied to clipboard!
To remove a Ceph Monitor with Ansible, use the shrink-mon.yml playbook.
Prerequisites
- An Ansible administration node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
Procedure
Change to the
/usr/share/ceph-ansible/directory.cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the
shrink-mon.ymlplaybook from theinfrastructure-playbooksdirectory to the current directory.cp infrastructure-playbooks/shrink-mon.yml .
[root@admin ceph-ansible]# cp infrastructure-playbooks/shrink-mon.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
shrink-mon.ymlplaybook for either normal or containerized deployments of Red Hat Ceph Storage:ansible-playbook shrink-mon.yml -e mon_to_kill=<hostname> -u <ansible-user>
[user@admin ceph-ansible]$ ansible-playbook shrink-mon.yml -e mon_to_kill=<hostname> -u <ansible-user>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
<hostname>with the short host name of the Monitor node. To remove more Monitors, separate their host names with comma. -
<ansible-user>with the name of the Ansible user
For example, to remove a Monitor that is located on a node with
monitor1host name:ansible-playbook shrink-mon.yml -e mon_to_kill=monitor1 -u user
[user@admin ceph-ansible]$ ansible-playbook shrink-mon.yml -e mon_to_kill=monitor1 -u userCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
- Remove the Monitor entry from all Ceph configuration files in the cluster.
Ensure that the Monitor has been successfully removed.
ceph -s
[root@monitor ~]# ceph -sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
1.2.5. Removing a Ceph Monitor using the command-line interface Copy linkLink copied to clipboard!
Removing a Ceph Monitor involves removing a ceph-mon daemon from the storage cluster and updating the storage cluster map.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
rootaccess to the monitor node.
Procedure
Stop the monitor service:
Syntax
systemctl stop ceph-mon@$MONITOR_ID
systemctl stop ceph-mon@$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-mon@node3
[root@monitor ~]# systemctl stop ceph-mon@node3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the monitor from the storage cluster:
Syntax
ceph mon remove $MONITOR_ID
ceph mon remove $MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon remove node3
[root@monitor ~]# ceph mon remove node3Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Remove the monitor entry from the Ceph configuration file, by default
/etc/ceph/ceph.conf. Redistribute the Ceph configuration file to all remaining Ceph nodes in the storage cluster:
Syntax
scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$TARGET_NODE_NAME:/etc/ceph/
scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$TARGET_NODE_NAME:/etc/ceph/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf root@$node1:/etc/ceph/
[root@monitor ~]# scp /etc/ceph/ceph.conf root@$node1:/etc/ceph/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Containers only. Disable the monitor service:
NotePerform steps 5-9 only if using containers.
Syntax
systemctl disable ceph-mon@$MONITOR_ID
systemctl disable ceph-mon@$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl disable ceph-mon@node3
[root@monitor ~]# systemctl disable ceph-mon@node3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Containers only. Remove the service from systemd:
rm /etc/systemd/system/ceph-mon@.service
[root@monitor ~]# rm /etc/systemd/system/ceph-mon@.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Containers only. Reload the systemd manager configuration:
systemctl daemon-reload
[root@monitor ~]# systemctl daemon-reloadCopy to Clipboard Copied! Toggle word wrap Toggle overflow Containers only. Reset the state of the failed monitor unit:
systemctl reset-failed
[root@monitor ~]# systemctl reset-failedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Containers only. Remove the
ceph-monRPM:docker exec node3 yum remove ceph-mon
[root@monitor ~]# docker exec node3 yum remove ceph-monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Archive the monitor data:
Syntax
mv /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID /var/lib/ceph/mon/removed-$CLUSTER_NAME-$MONITOR_ID
mv /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID /var/lib/ceph/mon/removed-$CLUSTER_NAME-$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mv /var/lib/ceph/mon/ceph-node3 /var/lib/ceph/mon/removed-ceph-node3
[root@monitor ~]# mv /var/lib/ceph/mon/ceph-node3 /var/lib/ceph/mon/removed-ceph-node3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the monitor data:
Syntax
rm -r /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID
rm -r /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
rm -r /var/lib/ceph/mon/ceph-node3
[root@monitor ~]# rm -r /var/lib/ceph/mon/ceph-node3Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Knowledgebase solution How to re-deploy Ceph Monitor in a director deployed Ceph cluster for more information.
1.2.6. Removing a Ceph Monitor from an unhealthy storage cluster Copy linkLink copied to clipboard!
This procedure removes a ceph-mon daemon from an unhealthy storage cluster. An unhealthy storage cluster that has placement groups persistently not active + clean.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
rootaccess to the monitor node. - At least one running Ceph Monitor node.
Procedure
Identify a surviving monitor and log in to that node:
ceph mon dump ssh $MONITOR_HOST_NAME
[root@monitor ~]# ceph mon dump [root@monitor ~]# ssh $MONITOR_HOST_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the
ceph-mondaemon and extract a copy of themonmapfile. :Syntax
systemctl stop ceph-mon@$MONITOR_ID ceph-mon -i $MONITOR_ID --extract-monmap $TEMPORARY_PATH
systemctl stop ceph-mon@$MONITOR_ID ceph-mon -i $MONITOR_ID --extract-monmap $TEMPORARY_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-mon@node1 ceph-mon -i node1 --extract-monmap /tmp/monmap
[root@monitor ~]# systemctl stop ceph-mon@node1 [root@monitor ~]# ceph-mon -i node1 --extract-monmap /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the non-surviving monitor(s):
Syntax
monmaptool $TEMPORARY_PATH --rm $MONITOR_ID
monmaptool $TEMPORARY_PATH --rm $MONITOR_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
monmaptool /tmp/monmap --rm node2
[root@monitor ~]# monmaptool /tmp/monmap --rm node2Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inject the surviving monitor map with the removed monitor(s) into the surviving monitor:
Syntax
ceph-mon -i $MONITOR_ID --inject-monmap $TEMPORARY_PATH
ceph-mon -i $MONITOR_ID --inject-monmap $TEMPORARY_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-mon -i node1 --inject-monmap /tmp/monmap
[root@monitor ~]# ceph-mon -i node1 --inject-monmap /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3. Ceph OSDs Copy linkLink copied to clipboard!
When a Red Hat Ceph Storage cluster is up and running, you can add OSDs to the storage cluster at runtime.
A Ceph OSD generally consists of one ceph-osd daemon for one storage drive and its associated journal within a node. If a node has multiple storage drives, then map one ceph-osd daemon for each drive.
Red Hat recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a storage cluster reaches its near full ratio, add one or more OSDs to expand the storage cluster’s capacity.
When you want to reduce the size of a Red Hat Ceph Storage cluster or replace the hardware, you can also remove an OSD at runtime. If the node has multiple storage drives, you might also need to remove one of the ceph-osd daemon for that drive. Generally, it’s a good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you remove an OSD that the storage cluster is not at its near full ratio.
Do not let a storage cluster reach the full ratio before adding an OSD. OSD failures that occur after the storage cluster reaches the near full ratio can cause the storage cluster to exceed the full ratio. Ceph blocks write access to protect the data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full ratio first.
1.3.1. Ceph OSD node configuration Copy linkLink copied to clipboard!
Ceph OSDs and their supporting hardware should be similarly configured as a storage strategy for the pool(s) that will use the OSDs. Ceph prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with drives of the same type or size. See the Storage Strategies guide for more details.
If you add drives of dissimilar size, then you will need to adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than older nodes in the storage cluster, that is, they might have a greater weight.
Before doing a new installation, review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
1.3.2. Mapping a container OSD ID to a drive Copy linkLink copied to clipboard!
Sometimes it is necessary to identify which drive a containerized OSD is using. For example, if an OSD has an issue you might need to know which drive it uses to verify the drive status. Also, for a non-containerized OSD you reference the OSD ID to start and stop it, but to start and stop a containerized OSD you must reference the drive it uses.
Prerequisites
- A running Red Hat Ceph Storage cluster in a containerized environment.
-
Having
rootaccess to the container host.
Procedure
Find a container name. For example, to identify the drive associated with
osd.5, open a terminal on the container node whereosd.5is running, and then rundocker psto list all containers:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use
docker execto runceph-volume lvm liston any OSD container name from the previous output:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From this output you can see
osd.5is associated with/dev/sdb.
Additional Resources
- See Replacing a failed OSD disk for more information
1.3.3. Adding a Ceph OSD using Ansible with the same disk topology Copy linkLink copied to clipboard!
For Ceph OSDs with the same disk topology, Ansible will add the same number of OSDs as other OSD nodes using the same device paths specified in the devices: section of the /usr/share/ceph-ansible/group_vars/osds file.
The new Ceph OSD node(s) will have the same configuration as the rest of the OSDs.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
-
Having
rootaccess to the new nodes. - The same number of OSD data drives as other OSD nodes in the storage cluster.
Procedure
Add the Ceph OSD node(s) to the
/etc/ansible/hostsfile, under the[osds]section:Example
[osds] ... osd06 $NEW_OSD_NODE_NAME
[osds] ... osd06 $NEW_OSD_NODE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach the Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m pingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the
add-osd.ymlfile to the/usr/share/ceph-ansible/directory:sudo cp infrastructure-playbooks/add-osd.yml .
[user@admin ceph-ansible]$ sudo cp infrastructure-playbooks/add-osd.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible playbook for either normal or containerized deployments of Ceph:
ansible-playbook add-osd.yml
[user@admin ceph-ansible]$ ansible-playbook add-osd.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteWhen adding an OSD, if the playbook fails with
PGs were not reported as active+clean, configure the following variables in theall.ymlfile to adjust the retries and delay:# OSD handler checks handler_health_osd_check_retries: 50 handler_health_osd_check_delay: 30
# OSD handler checks handler_health_osd_check_retries: 50 handler_health_osd_check_delay: 30Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.4. Adding a Ceph OSD using Ansible with different disk topologies Copy linkLink copied to clipboard!
For Ceph OSDs with different disk topologies, there are two approaches for adding the new OSD node(s) to an existing storage cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
-
Having
rootaccess to the new nodes.
Procedure
First Approach
Add the new Ceph OSD node(s) to the
/etc/ansible/hostsfile, under the[osds]section:Example
[osds] ... osd06 $NEW_OSD_NODE_NAME
[osds] ... osd06 $NEW_OSD_NODE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a new file for each new Ceph OSD node added to the storage cluster, under the
/etc/ansible/host_vars/directory:Syntax
touch /etc/ansible/host_vars/$NEW_OSD_NODE_NAME
touch /etc/ansible/host_vars/$NEW_OSD_NODE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
touch /etc/ansible/host_vars/osd07
[root@admin ~]# touch /etc/ansible/host_vars/osd07Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the new file, and add the
devices:anddedicated_devices:sections to the file. Under each of these sections add a-, space, then the full path to the block device names for this OSD node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach all the Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m pingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the
add-osd.ymlfile to the/usr/share/ceph-ansible/directory:sudo cp infrastructure-playbooks/add-osd.yml .
[user@admin ceph-ansible]$ sudo cp infrastructure-playbooks/add-osd.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible playbook:
ansible-playbook add-osd.yml
[user@admin ceph-ansible]$ ansible-playbook add-osd.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Second Approach
Add the new OSD node name to the
/etc/ansible/hostsfile, and use thedevicesanddedicated_devicesoptions, specifying the different disk topology:Example
[osds] ... osd07 devices="['/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf']" dedicated_devices="['/dev/sda', '/dev/sda', '/dev/sdb', '/dev/sdb']"
[osds] ... osd07 devices="['/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf']" dedicated_devices="['/dev/sda', '/dev/sda', '/dev/sdb', '/dev/sdb']"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach the all Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m pingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the
add-osd.ymlfile to the/usr/share/ceph-ansible/directory:sudo cp infrastructure-playbooks/add-osd.yml .
[user@admin ceph-ansible]$ sudo cp infrastructure-playbooks/add-osd.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible playbook:
ansible-playbook add-osd.yml
[user@admin ceph-ansible]$ ansible-playbook add-osd.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.5. Adding a Ceph OSD using the command-line interface Copy linkLink copied to clipboard!
Here is the high-level workflow for manually adding an OSD to a Red Hat Ceph Storage:
-
Install the
ceph-osdpackage and create a new OSD instance - Prepare and mount the OSD data and journal drives
- Add the new OSD node to the CRUSH map
- Update the owner and group permissions
-
Enable and start the
ceph-osddaemon
The ceph-disk command is deprecated. The ceph-volume command is now the preferred method for deploying OSDs from the command-line interface. Currently, the ceph-volume command only supports the lvm plugin. Red Hat will provide examples throughout this guide using both commands as a reference, allowing time for storage administrators to convert any custom scripts that rely on ceph-disk to ceph-volume instead.
See the Red Hat Ceph Storage Administration Guide, for more information on using the ceph-volume command.
For custom storage cluster names, use the --cluster $CLUSTER_NAME option with the ceph and ceph-osd commands.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
-
Having
rootaccess to the new nodes.
Procedure
Enable the Red Hat Ceph Storage 3 OSD software repository.
Red Hat Enterprise Linux
subscription-manager repos --enable=rhel-7-server-rhceph-3-osd-els-rpms
[root@osd ~]# subscription-manager repos --enable=rhel-7-server-rhceph-3-osd-els-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'
[user@osd ~]$ sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' [user@osd ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
/etc/ceph/directory:mkdir /etc/ceph
# mkdir /etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow On the new OSD node, copy the Ceph administration keyring and configuration files from one of the Ceph Monitor nodes:
Syntax
scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.client.admin.keyring /etc/ceph scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.conf /etc/ceph
scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.client.admin.keyring /etc/ceph scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.conf /etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp root@node1:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ scp root@node1:/etc/ceph/ceph.conf /etc/ceph/
[root@osd ~]# scp root@node1:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ [root@osd ~]# scp root@node1:/etc/ceph/ceph.conf /etc/ceph/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the
ceph-osdpackage on the new Ceph OSD node:Red Hat Enterprise Linux
yum install ceph-osd
[root@osd ~]# yum install ceph-osdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo apt-get install ceph-osd
[user@osd ~]$ sudo apt-get install ceph-osdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Decide if you want to collocate a journal or use a dedicated journal for the new OSDs.
NoteThe
--filestoreoption is required.For OSDs with a collocated journal:
Syntax
ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/$DEVICE_NAME
[root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/$DEVICE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Examples
ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/sda
[root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/sdaCopy to Clipboard Copied! Toggle word wrap Toggle overflow For OSDs with a dedicated journal:
Syntax
ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/$DEVICE_NAME /dev/$JOURNAL_DEVICE_NAME
[root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/$DEVICE_NAME /dev/$JOURNAL_DEVICE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow or
ceph-volume lvm prepare --filestore --data /dev/$DEVICE_NAME --journal /dev/$JOURNAL_DEVICE_NAME
[root@osd ~]# ceph-volume lvm prepare --filestore --data /dev/$DEVICE_NAME --journal /dev/$JOURNAL_DEVICE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Examples
ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/sda /dev/sdb
[root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare --filestore /dev/sda /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow ceph-volume lvm prepare --filestore --data /dev/vg00/lvol1 --journal /dev/sdb
[root@osd ~]# ceph-volume lvm prepare --filestore --data /dev/vg00/lvol1 --journal /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Set the
noupoption:ceph osd set noup
[root@osd ~]# ceph osd set noupCopy to Clipboard Copied! Toggle word wrap Toggle overflow Activate the new OSD:
Syntax
ceph-disk activate /dev/$DEVICE_NAME
[root@osd ~]# ceph-disk activate /dev/$DEVICE_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow or
ceph-volume lvm activate --filestore $OSD_ID $OSD_FSID
[root@osd ~]# ceph-volume lvm activate --filestore $OSD_ID $OSD_FSIDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-disk activate /dev/sda
[root@osd ~]# ceph-disk activate /dev/sdaCopy to Clipboard Copied! Toggle word wrap Toggle overflow ceph-volume lvm activate --filestore 0 6cc43680-4f6e-4feb-92ff-9c7ba204120e
[root@osd ~]# ceph-volume lvm activate --filestore 0 6cc43680-4f6e-4feb-92ff-9c7ba204120eCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the OSD to the CRUSH map:
Syntax
ceph osd crush add $OSD_ID $WEIGHT [$BUCKET_TYPE=$BUCKET_NAME ...]
ceph osd crush add $OSD_ID $WEIGHT [$BUCKET_TYPE=$BUCKET_NAME ...]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush add 4 1 host=node4
[root@osd ~]# ceph osd crush add 4 1 host=node4Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you specify more than one bucket, the command places the OSD into the most specific bucket out of those you specified, and it moves the bucket underneath any other buckets you specified.
NoteYou can also edit the CRUSH map manually. See the Editing a CRUSH map section in the Storage Strategies guide for Red Hat Ceph Storage 3.
ImportantIf you specify only the root bucket, then the OSD attaches directly to the root, but the CRUSH rules expect OSDs to be inside of the host bucket.
Unset the
noupoption:ceph osd unset noup
[root@osd ~]# ceph osd unset noupCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the owner and group permissions for the newly created directories:
Syntax
chown -R $OWNER:$GROUP $PATH_TO_DIRECTORY
chown -R $OWNER:$GROUP $PATH_TO_DIRECTORYCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
chown -R ceph:ceph /var/lib/ceph/osd chown -R ceph:ceph /var/log/ceph chown -R ceph:ceph /var/run/ceph chown -R ceph:ceph /etc/ceph
[root@osd ~]# chown -R ceph:ceph /var/lib/ceph/osd [root@osd ~]# chown -R ceph:ceph /var/log/ceph [root@osd ~]# chown -R ceph:ceph /var/run/ceph [root@osd ~]# chown -R ceph:ceph /etc/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you use clusters with custom names, then add the following line to the appropriate file:
Red Hat Enterprise Linux
echo "CLUSTER=$CLUSTER_NAME" >> /etc/sysconfig/ceph
[root@osd ~]# echo "CLUSTER=$CLUSTER_NAME" >> /etc/sysconfig/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ubuntu
sudo echo "CLUSTER=$CLUSTER_NAME" >> /etc/default/ceph
[user@osd ~]$ sudo echo "CLUSTER=$CLUSTER_NAME" >> /etc/default/cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
$CLUSTER_NAMEwith the custom cluster name.To ensure that the new OSD is
upand ready to receive data, enable and start the OSD service:Syntax
systemctl enable ceph-osd@$OSD_ID systemctl start ceph-osd@$OSD_ID
systemctl enable ceph-osd@$OSD_ID systemctl start ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl enable ceph-osd@4 systemctl start ceph-osd@4
[root@osd ~]# systemctl enable ceph-osd@4 [root@osd ~]# systemctl start ceph-osd@4Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.6. Removing a Ceph OSD using Ansible Copy linkLink copied to clipboard!
At times, you might need to scale down the capacity of a Red Hat Ceph Storage cluster. To remove an OSD from a Red Hat Ceph Storage cluster using Ansible, and depending on which OSD scenario is used, run the shrink-osd.yml or shrink-osd-ceph-disk.yml playbook. If osd_scenario is set to collocated or non-collocated, then use the shrink-osd-ceph-disk.yml playbook. If osd_scenario is set to lvm, then use the shrink-osd.yml playbook.
Removing an OSD from the storage cluster will destroy all the data contained on that OSD.
Prerequisites
- A running Red Hat Ceph Storage deployed by Ansible.
- A running Ansible administration node.
- Root-level access to the Ansible administration node.
Procedure
Change to the
/usr/share/ceph-ansible/directory.cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansibleCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Copy the admin keyring from
/etc/ceph/on the Ceph Monitor node to the node that includes the OSD that you want to remove. Copy the appropriate playbook from the
infrastructure-playbooksdirectory to the current directory.cp infrastructure-playbooks/shrink-osd.yml .
[root@admin ceph-ansible]# cp infrastructure-playbooks/shrink-osd.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow or
cp infrastructure-playbooks/shrink-osd-ceph-disk.yml .
[root@admin ceph-ansible]# cp infrastructure-playbooks/shrink-osd-ceph-disk.yml .Copy to Clipboard Copied! Toggle word wrap Toggle overflow For bare-metal or containers deployments, run the appropriate Ansible playbook:
Syntax
ansible-playbook shrink-osd.yml -e osd_to_kill=$ID -u $ANSIBLE_USER
ansible-playbook shrink-osd.yml -e osd_to_kill=$ID -u $ANSIBLE_USERCopy to Clipboard Copied! Toggle word wrap Toggle overflow or
ansible-playbook shrink-osd-ceph-disk.yml -e osd_to_kill=$ID -u $ANSIBLE_USER
ansible-playbook shrink-osd-ceph-disk.yml -e osd_to_kill=$ID -u $ANSIBLE_USERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
$IDwith the ID of the OSD. To remove more OSDs, separate the OSD IDs with a comma. -
$ANSIBLE_USERwith the name of the Ansible user
Example
ansible-playbook shrink-osd.yml -e osd_to_kill=1 -u user
[user@admin ceph-ansible]$ ansible-playbook shrink-osd.yml -e osd_to_kill=1 -u userCopy to Clipboard Copied! Toggle word wrap Toggle overflow or
ansible-playbook shrink-osd-ceph-disk.yml -e osd_to_kill=1 -u user
[user@admin ceph-ansible]$ ansible-playbook shrink-osd-ceph-disk.yml -e osd_to_kill=1 -u userCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Verify that the OSD has been successfully removed:
ceph osd tree
[root@mon ~]# ceph osd treeCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Red Hat Ceph Storage Installation Guide for Red Hat Enterprise Linux or Ubuntu for details.
1.3.7. Removing a Ceph OSD using the command-line interface Copy linkLink copied to clipboard!
Removing an OSD from a storage cluster involves updating the cluster map, removing its authentication key, removing the OSD from the OSD map, and removing the OSD from the ceph.conf file. If the node has multiple drives, you might need to remove an OSD for each drive by repeating this procedure.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Enough available OSDs so that the storage cluster is not at its
near fullratio. -
Having
rootaccess to the OSD node.
Procedure
Disable and stop the OSD service:
Syntax
systemctl disable ceph-osd@$OSD_ID systemctl stop ceph-osd@$OSD_ID
systemctl disable ceph-osd@$OSD_ID systemctl stop ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl disable ceph-osd@4 systemctl stop ceph-osd@4
[root@osd ~]# systemctl disable ceph-osd@4 [root@osd ~]# systemctl stop ceph-osd@4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the OSD is stopped, it is
down.Remove the OSD from the storage cluster:
Syntax
ceph osd out $OSD_ID
ceph osd out $OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd out 4
[root@osd ~]# ceph osd out 4Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantOnce the OSD is out, Ceph will start rebalancing and copying data to other OSDs in the storage cluster. Red Hat recommends waiting until the storage cluster becomes
active+cleanbefore proceeding to the next step. To observe the data migration, run the following command:ceph -w
[root@monitor ~]# ceph -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the CRUSH map so that it no longer receives data.
Syntax
ceph osd crush remove $OSD_NAME
ceph osd crush remove $OSD_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush remove osd.4
[root@osd ~]# ceph osd crush remove osd.4Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can also decompile the CRUSH map, remove the OSD from the device list, remove the device as an item in the host bucket or remove the host bucket. If it is in the CRUSH map and you intend to remove the host, recompile the map and set it. See the Storage Strategies Guide for details.
Remove the OSD authentication key:
Syntax
ceph auth del osd.$OSD_ID
ceph auth del osd.$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth del osd.4
[root@osd ~]# ceph auth del osd.4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD:
Syntax
ceph osd rm $OSD_ID
ceph osd rm $OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd rm 4
[root@osd ~]# ceph osd rm 4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the storage cluster’s configuration file, by default
/etc/ceph/ceph.conf, and remove the OSD entry, if it exists:Example
[osd.4] host = $HOST_NAME
[osd.4] host = $HOST_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Remove the reference to the OSD in the
/etc/fstabfile, if the OSD was added manually. Copy the updated configuration file to the
/etc/ceph/directory of all other nodes in the storage cluster.Syntax
scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$HOST_NAME:/etc/ceph/
scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$HOST_NAME:/etc/ceph/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf root@node4:/etc/ceph/
[root@osd ~]# scp /etc/ceph/ceph.conf root@node4:/etc/ceph/Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.8. Replacing a journal using the command-line interface Copy linkLink copied to clipboard!
The procedure to replace a journal when the journal and data devices are on the same physical device, for example when using osd_scenario: collocated, requires replacing the whole OSD. However, on an OSD where the journal is on a separate physical device from the data device, for example when using osd_scenario: non-collocated, you can replace just the journal device.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A new partition or storage device.
Procedure
Set the cluster to
nooutto prevent backfilling:ceph osd set noout
[root@osd1 ~]# ceph osd set nooutCopy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the OSD where the journal will be changed:
systemctl stop ceph-osd@$OSD_ID
[root@osd1 ~]# systemctl stop ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Flush the journal on the OSD:
ceph-osd -i $OSD_ID --flush-journal
[root@osd1 ~]# ceph-osd -i $OSD_ID --flush-journalCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the old journal partition to prevent partition UUID collision with the new partition:
sgdisk --delete=$OLD_PART_NUM -- $OLD_DEV_PATH
sgdisk --delete=$OLD_PART_NUM -- $OLD_DEV_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Replace
-
$OLD_PART_NUMwith the partition number of the old journal device. -
$OLD_DEV_PATHwith the path to the old journal device.
-
Example
sgdisk --delete=1 -- /dev/sda
[root@osd1 ~]# sgdisk --delete=1 -- /dev/sdaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the new journal partition on the new device. This
sgdiskcommand will use the next available partition number automatically:sgdisk --new=0:0:$JOURNAL_SIZE -- $NEW_DEV_PATH
sgdisk --new=0:0:$JOURNAL_SIZE -- $NEW_DEV_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Replace
-
$JOURNAL_SIZEwith the journal size appropriate for the environment, for example10240M. -
NEW_DEV_PATHwith the path to the device to be used for the new journal.
-
NoteThe minimum default size for a journal is 5 GB. Values over 10 GB are typically not needed. For additional details, contact Red Hat Support.
Example
sgdisk --new=0:0:10240M -- /dev/sda
[root@osd1 ~]# sgdisk --new=0:0:10240M -- /dev/sdaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the proper parameters on the new partition:
sgdisk --change-name=0:"ceph journal" --partition-guid=0:$OLD_PART_UUID --typecode=0:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- $NEW_DEV_PATH
sgdisk --change-name=0:"ceph journal" --partition-guid=0:$OLD_PART_UUID --typecode=0:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- $NEW_DEV_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Replace
-
$OLD_PART_UUIDwith UUID in thejournal_uuidfile for the relevant OSD. For example, for OSD0use the UUID in/var/lib/ceph/osd/ceph-0/journal_uuid. -
NEW_DEV_PATHwith the path to the device to be used for the new journal.
-
Example
sgdisk --change-name=0:"ceph journal" --partition-guid=0:a1279726-a32d-4101-880d-e8573bb11c16 --typecode=0:097c058d-0758-4199-a787-ce9bacb13f48 --mbrtogpt -- /dev/sda
[root@osd1 ~]# sgdisk --change-name=0:"ceph journal" --partition-guid=0:a1279726-a32d-4101-880d-e8573bb11c16 --typecode=0:097c058d-0758-4199-a787-ce9bacb13f48 --mbrtogpt -- /dev/sdaCopy to Clipboard Copied! Toggle word wrap Toggle overflow After running the above
sgdiskcommands the new journal partition is prepared for Ceph and the journal can be created on it.ImportantThis command cannot be combined with the partition creation command due to limitations in
sgdiskcausing the partition to not be created correctly.Create the new journal:
ceph-osd -i $OSD_ID --mkjournal
[root@osd1 ~]# ceph-osd -i $OSD_ID --mkjournalCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the OSD:
systemctl start ceph-osd@$OSD_ID
[root@osd1 ~]# systemctl start ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Remove the
nooutflag on the OSDs:ceph osd unset noout
[root@osd1 ~]# ceph osd unset nooutCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the journal is associated with the correct device:
ceph-disk list
[root@osd1 ~]# ceph-disk listCopy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.9. Observing the data migration Copy linkLink copied to clipboard!
When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or existing OSD(s).
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Recently added or removed an OSD.
Procedure
To observe the data migration:
ceph -w
[root@monitor ~]# ceph -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Watch as the placement group states change from
active+cleantoactive, some degraded objects, and finallyactive+cleanwhen migration completes. -
To exit the utility, press
Ctrl + C.
1.4. Recalculating the placement groups Copy linkLink copied to clipboard!
Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is build upon the given redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded pools, the number of OSDs to use is defined by the number of chunks.
When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a tool is available to produce agile environments.
During lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the pg_num of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten times the number of OSDs, the number of PGs per OSD will go down to ten only. Because small number of PGs per OSD will tend to an unevenly distributed capacity, consider adjusting the PGs per pool.
Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.
Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O queues and high latency rendering the storage cluster unusable or will result in long healing times.
Additional Resources
- See the PG calculator for calculating the values by a given use case.
- See the Erasure Code Pools chapter in the Red Hat Ceph Storage Strategies Guide for more information.
1.5. Using the Ceph Manager balancer module Copy linkLink copied to clipboard!
The balancer is a module for Ceph Manager that optimizes the placement of placement groups, or PGs, across OSDs in order to achieve a balanced distribution, either automatically or in a supervised fashion.
Prerequisites
- A running Red Hat Ceph Storage cluster
Start the balancer
Ensure the balancer module is enabled:
ceph mgr module enable balancer
[root@mon ~]# ceph mgr module enable balancerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Turn on the balancer module:
ceph balancer on
[root@mon ~]# ceph balancer onCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Modes
There are currently two supported balancer modes:
crush-compat: The CRUSH compat mode uses the compat
weight-setfeature, introduced in Ceph Luminous, to manage an alternative set of weights for devices in the CRUSH hierarchy. The normal weights should remain set to the size of the device to reflect the target amount of data that you want to store on the device. The balancer then optimizes theweight-setvalues, adjusting them up or down in small increments in order to achieve a distribution that matches the target distribution as closely as possible. Because PG placement is a pseudorandom process, there is a natural amount of variation in the placement; by optimizing the weights, the balancer counter-acts that natural variation.This mode is fully backwards compatible with older clients. When an OSDMap and CRUSH map are shared with older clients, the balancer presents the optimized weights as the real weights.
The primary restriction of this mode is that the balancer cannot handle multiple CRUSH hierarchies with different placement rules if the subtrees of the hierarchy share any OSDs. Because this configuration makes managing space utilization on the shared OSDs difficult, it is generally not recommended. As such, this restriction is normally not an issue.
upmap: Starting with Luminous, the OSDMap can store explicit mappings for individual OSDs as exceptions to the normal CRUSH placement calculation. These
upmapentries provide fine-grained control over the PG mapping. This CRUSH mode will optimize the placement of individual PGs in order to achieve a balanced distribution. In most cases, this distribution is "perfect," with an equal number of PGs on each OSD +/-1 PG, as they might not divide evenly.ImportantUsing upmap requires that all clients be running Red Hat Ceph Storage 3.x or later, and Red Hat Enterprise Linux 7.5 or later.
To allow use of this feature, you must tell the cluster that it only needs to support luminous or later clients with:
ceph osd set-require-min-compat-client luminous
[root@admin ~]# ceph osd set-require-min-compat-client luminousCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command fails if any pre-luminous clients or daemons are connected to the monitors.
Due to a known issue, kernel CephFS clients report themselves as jewel clients. To work around this issue, use the
--yes-i-really-mean-itflag:ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
[root@admin ~]# ceph osd set-require-min-compat-client luminous --yes-i-really-mean-itCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can check what client versions are in use with:
ceph features
[root@admin ~]# ceph featuresCopy to Clipboard Copied! Toggle word wrap Toggle overflow WarningIn Red Hat Ceph Storage 3.x, the upmap feature is only supported for use by the Ceph Manager balancer module for balancing of PGs as the cluster is used. Manual rebalancing of PGs using the upmap feature is not supported in Red Hat Ceph Storage 3.x.
The default mode is crush-compat. The mode can be changed with:
ceph balancer mode upmap
[root@mon ~]# ceph balancer mode upmap
or:
ceph balancer mode crush-compat
[root@mon ~]# ceph balancer mode crush-compat
Status
The current status of the balancer can be checked at any time with:
ceph balancer status
[root@mon ~]# ceph balancer status
Automatic balancing
By default, when turning on the balancer module, automatic balancing is used:
ceph balancer on
[root@mon ~]# ceph balancer on
The balancer can be turned back off again with:
ceph balancer off
[root@mon ~]# ceph balancer off
This will use the crush-compat mode, which is backward compatible with older clients and will make small changes to the data distribution over time to ensure that OSDs are equally utilized.
Throttling
No adjustments will be made to the PG distribution if the cluster is degraded, for example, if an OSD has failed and the system has not yet healed itself.
When the cluster is healthy, the balancer throttles its changes such that the percentage of PGs that are misplaced, or need to be moved, is below a threshold of 5% by default. This percentage can be adjusted using the max_misplaced setting. For example, to increase the threshold to 7%:
ceph config-key set mgr/balancer/max_misplaced .07
[root@mon ~]# ceph config-key set mgr/balancer/max_misplaced .07
Supervised optimization
The balancer operation is broken into a few distinct phases:
-
Building a
plan -
Evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a
plan Executing the
planTo evaluate and score the current distribution:
ceph balancer eval
[root@mon ~]# ceph balancer evalCopy to Clipboard Copied! Toggle word wrap Toggle overflow To evaluate the distribution for a single pool:
ceph balancer eval <pool-name>
[root@mon ~]# ceph balancer eval <pool-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To see greater detail for the evaluation:
ceph balancer eval-verbose ...
[root@mon ~]# ceph balancer eval-verbose ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow To generate a plan using the currently configured mode:
ceph balancer optimize <plan-name>
[root@mon ~]# ceph balancer optimize <plan-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<plan-name>with a custom plan name.To see the contents of a plan:
ceph balancer show <plan-name>
[root@mon ~]# ceph balancer show <plan-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To discard old plans:
ceph balancer rm <plan-name>
[root@mon ~]# ceph balancer rm <plan-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To see currently recorded plans use the status command:
ceph balancer status
[root@mon ~]# ceph balancer statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow To calculate the quality of the distribution that would result after executing a plan:
ceph balancer eval <plan-name>
[root@mon ~]# ceph balancer eval <plan-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To execute the plan:
ceph balancer execute <plan-name>
[root@mon ~]# ceph balancer execute <plan-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow [NOTE]: Only execute the plan if is expected to improve the distribution. After execution, the plan will be discarded.
1.6. Additional Resources Copy linkLink copied to clipboard!
- See the Placement Groups (PGs) chapter in the Red Hat Ceph Storage Strategies Guide for more information.
Chapter 2. Handling a disk failure Copy linkLink copied to clipboard!
As a storage administrator, you will have to deal with a disk failure at some point over the life time of the storage cluster. Testing and simulating a disk failure before a real failure happens will ensure you are ready for when the real thing does happen.
Here is the high-level workflow for replacing a failed disk:
- Find the failed OSD.
- Take OSD out.
- Stop the OSD daemon on the node.
- Check Ceph’s status.
- Remove the OSD from the CRUSH map.
- Delete the OSD authorization.
- Remove the OSD from the storage cluster.
- Unmount the filesystem on node.
- Replace the failed drive.
- Add the OSD back to the storage cluster.
- Check Ceph’s status.
2.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
- A failed disk.
2.2. Disk failures Copy linkLink copied to clipboard!
Ceph is designed for fault tolerance, which means Ceph can operate in a degraded state without losing data. Ceph can still operate even if a data storage drive fails. The degraded state means the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the storage cluster. When an OSD gets marked down this can mean the drive has failed.
When a drive fails, initially the OSD status will be down, but still in the storage cluster. Networking issues can also mark an OSD as down even if it is really up. First check for any network issues in the environment. If the networking checks out okay, then it is likely the OSD drive has failed.
Modern servers typically deploy with hot-swappable drives allowing you to pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph you will also have to remove the software-defined part of the OSD.
2.2.1. Replacing a failed OSD disk Copy linkLink copied to clipboard!
The general procedure for replacing an OSD involves removing the OSD from the storage cluster, replacing the drive and then recreating the OSD.
When replacing the BlueStore block.db disk that contains the BlueStore OSD’s database partitions, Red Hat only supports the re-deploying of all OSDs using Ansible. A corrupt block.db files will impact all OSDs which are included in that block.db files.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Check storage cluster health:
ceph health
# ceph healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the OSD location in the CRUSH hierarchy:
ceph osd tree | grep -i down
# ceph osd tree | grep -i downCopy to Clipboard Copied! Toggle word wrap Toggle overflow On the OSD node, try to start the OSD:
systemctl start ceph-osd@$OSD_ID
# systemctl start ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteIf the OSD is
down, then the OSD will eventually get markedout. This is normal behavior for Ceph Storage. When the OSD gets markedout, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the storage cluster. While the storage cluster is backfilling, the cluster will be in adegradedstate.For containerized deployments of Ceph, try to start the OSD container by referencing the drive associated with the OSD:
systemctl start ceph-osd@$OSD_DRIVE
# systemctl start ceph-osd@$OSD_DRIVECopy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteThe drive associated with the OSD can be determined by Mapping a container OSD ID to a drive.
Check the failed OSD’s mount point:
NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted, so you cannot run
dfto check its mount point. Use another method to determine if the OSD drive has failed. For example, runsmartctlon the drive from the container node.df -h
# df -hCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you cannot restart the OSD, you can check the mount point. If the mount point no longer appears, then you can try remounting the OSD drive and restarting the OSD. If you cannot restore the mount point, then you might have a failed OSD drive.
Using the
smartctlutility cab help determine if the drive is healthy. For example:yum install smartmontools smartctl -H /dev/$DRIVE
# yum install smartmontools # smartctl -H /dev/$DRIVECopy to Clipboard Copied! Toggle word wrap Toggle overflow If the drive has failed, you will need to replace it.
Stop the OSD process:
systemctl stop ceph-osd@$OSD_ID
# systemctl stop ceph-osd@$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow If using FileStore, then flush the journal to disk:
ceph osd -i $$OSD_ID --flush-journal
# ceph osd -i $$OSD_ID --flush-journalCopy to Clipboard Copied! Toggle word wrap Toggle overflow
For containerized deployments of Ceph, stop the OSD container by referencing the drive associated with the OSD:
systemctl stop ceph-osd@$OSD_DRIVE
# systemctl stop ceph-osd@$OSD_DRIVECopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD out of the storage cluster:
ceph osd out $OSD_ID
# ceph osd out $OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the failed OSD is backfilling:
ceph -w
# ceph -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the CRUSH Map:
ceph osd crush remove osd.$OSD_ID
# ceph osd crush remove osd.$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis step is only needed, if you are permanently removing the OSD and not redeploying it.
Remove the OSD’s authentication keys:
ceph auth del osd.$OSD_ID
# ceph auth del osd.$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the keys for the OSD are not listed:
ceph auth list
# ceph auth listCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the storage cluster:
ceph osd rm osd.$OSD_ID
# ceph osd rm osd.$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Unmount the failed drive path:
NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted. In this case there is nothing to unmount and this step can be skipped.
umount /var/lib/ceph/osd/$CLUSTER_NAME-$OSD_ID
# umount /var/lib/ceph/osd/$CLUSTER_NAME-$OSD_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Replace the physical drive. Refer to the hardware vendor’s documentation for the node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MIGHT need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you might set the cluster to
nooutto prevent backfilling:ceph osd set noout
# ceph osd set nooutCopy to Clipboard Copied! Toggle word wrap Toggle overflow Once you replace the drive and you bring the node and its OSDs back online, remove the
nooutsetting:ceph osd unset noout
# ceph osd unset nooutCopy to Clipboard Copied! Toggle word wrap Toggle overflow Allow the new drive to appear under the
/dev/directory and make a note of the drive path before proceeding further.- Find the OSD drive and format the disk.
Recreate the OSD:
- Using Ansible.
- Using the command-line interface.
Check the CRUSH hierarchy to ensure it is accurate:
ceph osd tree
# ceph osd treeCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you are not satisfied with the location of the OSD in the CRUSH hierarchy, you might move it with the
movecommand:ceph osd crush move $BUCKET_TO_MOVE $BUCKET_TYPE=$PARENT_BUCKET
# ceph osd crush move $BUCKET_TO_MOVE $BUCKET_TYPE=$PARENT_BUCKETCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify the OSD is online.
2.2.2. Replacing an OSD drive while retaining the OSD ID Copy linkLink copied to clipboard!
When replacing a failed OSD drive, you can keep the original OSD ID and CRUSH map entry.
The ceph-volume lvm commands defaults to BlueStore for OSDs. To use FileStore OSDs, then use the --filestore, --data and --journal options.
See the Preparing the OSD Data and Journal Drives section for more details.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Destroy the OSD:
ceph osd destroy $OSD_ID --yes-i-really-mean-it
ceph osd destroy $OSD_ID --yes-i-really-mean-itCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd destroy 1 --yes-i-really-mean-it
$ ceph osd destroy 1 --yes-i-really-mean-itCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, if the replacement disk was used previously, then you need to
zapthe disk:ceph-volume lvm zap $DEVICE
ceph-volume lvm zap $DEVICECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm zap /dev/sdb
$ ceph-volume lvm zap /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the new OSD with the existing OSD ID:
ceph-volume lvm create --osd-id $OSD_ID --data $DEVICE
ceph-volume lvm create --osd-id $OSD_ID --data $DEVICECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm create --osd-id 1 --data /dev/sdb
$ ceph-volume lvm create --osd-id 1 --data /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow
2.3. Simulating a disk failure Copy linkLink copied to clipboard!
There are two disk failure scenarios: hard and soft. A hard failure means replacing the disk. Soft failure might be an issue with the device driver or some other software component.
In the case of a soft failure, replacing the disk might not be needed. If replacing a disk, then steps need to be followed to remove the failed disk and add the replacement disk to Ceph. In order to simulate a soft disk failure the best thing to do is delete the device. Choose a device and delete the device from the system.
echo 1 > /sys/block/$DEVICE/device/delete
echo 1 > /sys/block/$DEVICE/device/delete
Example
echo 1 > /sys/block/sdb/device/delete
[root@ceph1 ~]# echo 1 > /sys/block/sdb/device/delete
In the Ceph OSD log, on the OSD node, Ceph detected the failure and started the recovery process automatically.
Example
Looking at osd disk tree we also see the disk is offline.
Chapter 3. Handling a node failure Copy linkLink copied to clipboard!
As a storage administrator, you might experience a whole node failing within the storage cluster, and handling a node failure is similar to handling a disk failure. With a node failure, instead of Ceph recovering PGs (placement groups) for only one disk, all PGs on the disks within that node must be recovered. Ceph will detect that the OSDs are all down and automatically start the recovery process, known as self-healing.
There are three node failure scenarios. Here is the high-level workflow for each scenario when replacing a node:
Replacing the node, but using the root and Ceph OSD disks from the failed node.
- Disable backfilling.
- Replace the node, taking the disks from old node, and adding them to the new node.
- Enable backfilling.
Replacing the node, reinstalling the operating system, and using the Ceph OSD disks from the failed node.
- Disable backfilling.
- Create a backup of the Ceph configuration.
Replace the node and add the Ceph OSD disks from failed node.
- Configuring disks as JBOD.
- Install the operating system.
- Restore the Ceph configuration.
-
Run
ceph-ansible. - Enable backfilling.
Replacing the node, reinstalling the operating system, and using all new Ceph OSDs disks.
- Disable backfilling.
- Remove all OSDs on the failed node from the storage cluster.
- Create a backup of the Ceph configuration.
Replace the node and add the Ceph OSD disks from failed node.
- Configuring disks as JBOD.
- Install the operating system.
-
Run
ceph-ansible. - Enable backfilling.
3.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
- A failed node.
3.2. Considerations before adding or removing a node Copy linkLink copied to clipboard!
One of the outstanding features of Ceph is the ability to add or remove Ceph OSD nodes at run time. This means you can resize the storage cluster capacity or replace hardware without taking down the storage cluster. The ability to serve Ceph clients while the cluster is in a degraded state also has operational benefits, for example, you can add or remove or replace hardware during regular business hours, rather than working overtime or weekends. However, adding and removing Ceph OSD nodes can have a significant impact on performance, and you must consider the performance impact of adding, removing or replacing hardware on the storage cluster before you act.
From a capacity perspective, removing a node removes the OSDs contained within the node and effectively reduces the capacity of the storage cluster. Adding a node adds the OSDs contained within the node, and effectively expands the capacity of the storage cluster. Whether you are expanding or reducing the storage cluster capacity, adding or removing Ceph OSD nodes will induce backfilling as the cluster rebalances. During that rebalancing time period, Ceph uses additional resources which can impact storage cluster performance.
Imagine a storage cluster that contains Ceph nodes where each node has four OSDs. In a storage cluster of four nodes, with 16 OSDs, removing a node removes 4 OSDs and cuts capacity by 25%. In a storage cluster of three nodes, with 12 OSDs, adding a node adds 4 OSDs and increases capacity by 33%.
In a production Ceph storage cluster, a Ceph OSD node has a particular hardware configuration that facilitates a particular type of storage strategy. For more details, see Storage Strategies guide for Red Hat Ceph Storage 3.
Since a Ceph OSD node is part of a CRUSH hierarchy, the performance impact of adding or removing a node typically affects the performance of pools that use that CRUSH hierarchy, that is, the CRUSH ruleset.
3.3. Performance considerations Copy linkLink copied to clipboard!
The following factors typically have an impact on storage cluster’s performance when adding or removing Ceph OSD nodes:
Current Client Load on Affected Pools:
Ceph clients place load on the I/O interface to Ceph; namely, load on a pool. A pool maps to a CRUSH ruleset. The underlying CRUSH hierarchy allows Ceph to place data across failure domains. If the underlying Ceph OSD node involves a pool under high client loads, the client load may have a significant impact on recovery time and impact performance. More specifically, since write operations require data replication for durability, write-intensive client loads will increase the time for the storage cluster to recover.
Capacity Added or Removed:
Generally, the capacity you are adding or removing as a percentage of the overall cluster will have an impact on the storage cluster’s time to recover. Additionally, the storage density of the node you add or remove may have an impact on the time to recover for example, a node with 36 OSDs will typically take longer to recover compared to a node with 12 OSDs. When removing nodes, you MUST ensure that you have sufficient spare capacity so that you will not reach the full ratio or near full ratio. If the storage cluster reaches the full ratio, Ceph will suspend write operations to prevent data loss.
Pools and CRUSH Ruleset:
A Ceph OSD node maps to at least one Ceph CRUSH hierarchy, and the hierarchy maps to at least one pool. Each pool that uses the CRUSH hierarchy (ruleset) where you add or remove a Ceph OSD node will experience a performance impact.
Pool Type and Durability:
Replication pools tend to use more network bandwidth to replicate deep copies of the data, whereas erasure coded pools tend to use more CPU to calculate k+m coding chunks. The more copies of the data, for example, the size or the more k+m chunks, the longer it will take for the storage cluster to recover.
Total Throughput Characteristics:
Drives, controllers and network interface cards all have throughput characteristics that may impact the recovery time. Generally, nodes with higher throughput characteristics, for example, 10 Gbps and SSDs will recover faster than nodes with lower throughput characteristics, for example, 1 Gbps and SATA drives.
3.4. Recommendations for adding or removing nodes Copy linkLink copied to clipboard!
The failure of a node may preclude removing one OSD at a time before changing the node. Circumstances can allow you to reduce a negative performance impact when adding or removing Ceph OSD nodes, Red Hat recommends adding or removing one OSD at a time within a node and allowing the cluster to recover before proceeding to the next OSD. For details on removing an OSD:
- Using Ansible.
- Using the command-line interface.
When adding a Ceph node, Red hat also recommends adding one OSD at a time. For details on adding an OSD:
- Using Ansible.
- Using the command-line interface.
When adding or removing Ceph OSD nodes, consider that other ongoing processes will have an impact on performance too. To reduce the impact on client I/O, Red Hat recommends the following:
Calculate capacity:
Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all its OSDs WITHOUT reaching the full ratio. Reaching the full ratio will cause the cluster to refuse write operations.
Temporarily Disable Scrubbing:
Scrubbing is essential to ensuring the durability of the storage cluster’s data; however, it is resource intensive. Before adding or removing a Ceph OSD node, disable scrubbing and deep scrubbing and let the current scrubbing operations complete before proceeding, for example:
ceph osd set noscrub ceph osd set nodeep-scrub
ceph osd set noscrub
ceph osd set nodeep-scrub
Once you have added or removed a Ceph OSD node and the storage cluster has returned to an active+clean state, unset the noscrub and nodeep-scrub settings.
Limit Backfill and Recovery:
If you have reasonable data durability, for example, osd pool default size = 3 and osd pool default min size = 2, there is nothing wrong with operating in a degraded state. You can tune the storage cluster for the fastest possible recovery time, but this will impact Ceph client I/O performance significantly. To maintain the highest Ceph client I/O performance, limit the backfill and recovery operations and allow them to take longer, for example:
osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_op_priority = 1
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
You can also set sleep and delay parameters such as osd_recovery_sleep.
Finally, if you are expanding the size of the storage cluster, you may need to increase the number of placement groups. If you determine that you need to expand the number of placement groups, Red Hat recommends making incremental increases in the number of placement groups. Increasing the number of placement groups by a significant number will cause performance to degrade considerably.
3.5. Adding a Ceph OSD node Copy linkLink copied to clipboard!
To expand the capacity of the Red Hat Ceph Storage cluster, add an OSD node.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A provisioned node with a network connection.
- Installation of Red Hat Enterprise Linux 7 or Ubuntu 16.04.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
Procedure
- Verify that other nodes in the storage cluster can reach the new node by its short host name.
Temporarily disable scrubbing:
ceph osd set noscrub ceph osd set nodeep-scrub
[root@monitor ~]# ceph osd set noscrub [root@monitor ~]# ceph osd set nodeep-scrubCopy to Clipboard Copied! Toggle word wrap Toggle overflow Limit the back-fill and recovery features:
Syntax
ceph tell $DAEMON_TYPE.* injectargs --$OPTION_NAME $VALUE [--$OPTION_NAME $VALUE]
ceph tell $DAEMON_TYPE.* injectargs --$OPTION_NAME $VALUE [--$OPTION_NAME $VALUE]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
[root@monitor ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the new node to the CRUSH Map:
Syntax
ceph osd crush add-bucket $BUCKET_NAME $BUCKET_TYPE
ceph osd crush add-bucket $BUCKET_NAME $BUCKET_TYPECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush add-bucket node2 host
[root@monitor ~]# ceph osd crush add-bucket node2 hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add an OSD for each disk on the node to the storage cluster.
- Using Ansible.
Using the command-line interface.
ImportantWhen adding an OSD node to a Red Hat Ceph Storage cluster Red Hat recommends adding one OSD at a time within the node and allowing the cluster to recover to an
active+cleanstate before proceeding to the next OSD.
Additional Resources
- See the Setting a Specific Configuration Setting at Runtime section in the Red Hat Ceph Storage Configuration Guide for more details.
- See the Add a Bucket and Move a Bucket sections in the Red Hat Ceph Storage Storage Strategies Guide for details on placing the node at an appropriate location in the CRUSH hierarchy,.
3.6. Removing a Ceph OSD node Copy linkLink copied to clipboard!
To reduce the capacity of a storage cluster remove an OSD node.
Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all OSDs WITHOUT reaching the full ratio. Reaching the full ratio will cause the cluster to refuse write operations.
Prerequisites
- A running Red Hat Ceph Storage cluster.
Procedure
Check storage cluster’s capacity:
ceph df rados df ceph osd df
[root@monitor ~]# ceph df [root@monitor ~]# rados df [root@monitor ~]# ceph osd dfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Temporarily disable scrubbing:
ceph osd set noscrub ceph osd set nodeep-scrub
[root@monitor ~]# ceph osd set noscrub [root@monitor ~]# ceph osd set nodeep-scrubCopy to Clipboard Copied! Toggle word wrap Toggle overflow Limit the back-fill and recovery features:
Syntax
ceph tell $DAEMON_TYPE.* injectargs --$OPTION_NAME $VALUE [--$OPTION_NAME $VALUE]
ceph tell $DAEMON_TYPE.* injectargs --$OPTION_NAME $VALUE [--$OPTION_NAME $VALUE]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
[root@monitor ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove each OSD on the node from the storage cluster:
- Using Ansible.
Using the command-line interface.
ImportantWhen removing an OSD node from the storage cluster, Red Hat recommends removing one OSD at a time within the node and allowing the cluster to recover to an
active+cleanstate before proceeding to the next OSD.After removing an OSD check to verify the storage cluster is not getting to the near-full ratio:
ceph -s ceph df
[root@monitor ~]# ceph -s [root@monitor ~]# ceph dfCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Repeat this step until all OSDs on the node are removed from the storage cluster.
Once all OSDs are removed, remove the host bucket from the CRUSH map:
Syntax
ceph osd crush rm $BUCKET_NAME
ceph osd crush rm $BUCKET_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush rm node2
[root@monitor ~]# ceph osd crush rm node2Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- * See the Setting a Specific Configuration Setting at Runtime section in the Red Hat Ceph Storage Configuration Guide for more details.
3.7. Simulating a node failure Copy linkLink copied to clipboard!
To simulate hard node failure power-off the node and reinstall the operating system.
Prerequisites
- A healthy running Red Hat Ceph Storage cluster.
Procedure
Check storage capacity to understand what removing node means to storage cluster:
ceph df rados df ceph osd df
# ceph df # rados df # ceph osd dfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, disable recovery and backfilling:
ceph osd set noout ceph osd set noscrub ceph osd set nodeep-scrub
# ceph osd set noout # ceph osd set noscrub # ceph osd set nodeep-scrubCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Shutdown the node.
If the host name will change, then remove the node from CRUSH map:
ceph osd crush rm ceph3
[root@ceph1 ~]# ceph osd crush rm ceph3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check status of cluster:
ceph -s
[root@ceph1 ~]# ceph -sCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Reinstall the operating system on the node.
Add an Ansible user and SSH keys:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the administration node, copy the SSH keys for
ansibleuser:ssh-copy-id ceph3
[ansible@admin ~]$ ssh-copy-id ceph3Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the administration node, re-run the Ansible playbook:
cd /usr/share/ceph-ansible ansible-playbook site.yml
[ansible@admin ~]$ cd /usr/share/ceph-ansible [ansible@admin ~]$ ansible-playbook site.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example Output
PLAY RECAP ******************************************************************** ceph1 : ok=368 changed=2 unreachable=0 failed=0 ceph2 : ok=284 changed=0 unreachable=0 failed=0 ceph3 : ok=284 changed=15 unreachable=0 failed=0
PLAY RECAP ******************************************************************** ceph1 : ok=368 changed=2 unreachable=0 failed=0 ceph2 : ok=284 changed=0 unreachable=0 failed=0 ceph3 : ok=284 changed=15 unreachable=0 failed=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, enable recovery and backfilling:
ceph osd unset noout ceph osd unset noscrub ceph osd unset nodeep-scrub
[root@ceph3 ~]# ceph osd unset noout [root@ceph3 ~]# ceph osd unset noscrub [root@ceph3 ~]# ceph osd unset nodeep-scrubCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check Ceph’s health:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
For more information on installing Red Hat Ceph Storage:
Chapter 4. Handling a data center failure Copy linkLink copied to clipboard!
Red Hat Ceph Storage can withstand catastrophic failures to the infrastructure, such as losing one of three data centers in a stretch cluster. For the standard object store use case, configuring all three data centers can be done independently with replication set up between them. In this scenario, the cluster configuration in each of the data centers might be different, reflecting the local capabilities and dependencies.
A logical structure of the placement hierarchy should be considered. A proper CRUSH map can be used, reflecting the hierarchical structure of the failure domains within the infrastructure. Using logical hierarchical definitions improves the reliability of the storage cluster, versus using the standard hierarchical definitions. Failure domains are defined in the CRUSH map. The default CRUSH map contains all nodes in a flat hierarchy.
In three data center environment example, with a stretch cluster, the placement of nodes should be managed in a way that one data center can go down, but the storage cluster stays up and running. Consider which failure domain a node resides in when using 3-way replication for the data, in the case of an outage of one data center, it is possible that some data can be left with one copy. When this scenario happens, there are two options:
- Leave the data in read-only status with the standard settings.
- Live with only one copy for the duration of the outage.
With the standard settings, and because of the randomness of data placement across the nodes, not all the data will be affected, but some data can have only one copy and the storage cluster would revert to read-only mode.
In the example below the resulting map is derived from the initial setup of the cluster with 6 OSD nodes. In this example all nodes have only one disk and hence one OSD. All of the nodes are arranged under the default root, that is the standard root of the hierarchy tree. Because there is a weight assigned to two of the OSDs, these OSDs receive fewer chunks of data than the other OSDs. These nodes were introduced later with bigger disks than the initial OSD disks. This does not affect the data placement to withstand a failure of a group of nodes.
Standard CRUSH map
Using logical hierarchical definitions to group the nodes into same data center, can achieve data placement maturity. Possible definition types of root, datacenter, rack, row and host allow the reflection of the failure domains for the three data center stretch cluster:
- Nodes ceph-node1 and ceph-node2 reside in data center 1 (DC1)
- Nodes ceph-node3 and ceph-node5 reside in data center 2 (DC2)
- Nodes ceph-node4 and ceph-node6 reside in data center 3 (DC3)
- All data centers belong to the same structure (allDC)
Since all OSDs in a host belong to the host definition there is no change needed. All the other assignments can be adjusted during runtime of the storage cluster by:
Defining the bucket structure with the following commands:
ceph osd crush add-bucket allDC root ceph osd crush add-bucket DC1 datacenter ceph osd crush add-bucket DC2 datacenter ceph osd crush add-bucket DC3 datacenter
ceph osd crush add-bucket allDC root ceph osd crush add-bucket DC1 datacenter ceph osd crush add-bucket DC2 datacenter ceph osd crush add-bucket DC3 datacenterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Moving the nodes into the appropriate place within this structure by modifying the CRUSH map:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Within this structure any new hosts can be added too, as well as new disks. By placing the OSDs at the right place in the hierarchy the CRUSH algorithm is changed to place redundant pieces into different failure domains within the structure.
The above example results in the following:
The listing from above shows the resulting CRUSH map by displaying the osd tree. Easy to see is now how the hosts belong to a data center and all data centers belong to the same top level structure but clearly distinguishing between locations.
Placing the data in the proper locations according to the map works only properly within the healthy cluster. Misplacement might happen under circumstances, when some OSDs not available. Those misplacements will be corrected automatically once it’s possible to do so.
Additional Resources
- See the CRUSH administration chapter in the Red Hat Ceph Storage Storage Strategies Guide for more information.