Operations Guide
Operational tasks for Red Hat Ceph Storage
Abstract
Chapter 1. Managing the storage cluster size Copy linkLink copied to clipboard!
As a storage administrator, you can manage the storage cluster size by adding or removing Ceph Monitors or OSDs as storage capacity expands or shrinks. You can manage the storage cluster size by using Ceph Ansible, or by using the command-line interface (CLI).
1.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph Monitor and OSD nodes.
1.2. Ceph Monitors Copy linkLink copied to clipboard!
Ceph Monitors are lightweight processes that maintain a master copy of the storage cluster map. All Ceph clients contact a Ceph monitor and retrieve the current copy of the storage cluster map, enabling clients to bind to a pool and read and write data.
Ceph Monitors use a variation of the Paxos protocol to establish consensus about maps and other critical information across the storage cluster. Due to the nature of Paxos, Ceph requires a majority of monitors running to establish a quorum, thus establishing consensus.
Red Hat requires at least three monitors on separate hosts to receive support for a production cluster.
Red Hat recommends deploying an odd number of monitors. An odd number of Ceph Monitors has a higher resiliency to failures than an even number of monitors. For example, to maintain a quorum on a two-monitor deployment, Ceph cannot tolerate any failures; with three monitors, one failure; with four monitors, one failure; with five monitors, two failures. This is why an odd number is advisable. Summarizing, Ceph needs a majority of monitors to be running and to be able to communicate with each other, two out of three, three out of four, and so on.
For an initial deployment of a multi-node Ceph storage cluster, Red Hat requires three monitors, increasing the number two at a time if a valid need for more than three monitors exists.
Since Ceph Monitors are lightweight, it is possible to run them on the same host as OpenStack nodes. However, Red Hat recommends running monitors on separate hosts.
Red Hat does NOT support collocating Ceph Monitors and OSDs on the same node. Doing this can have a negative impact to storage cluster performance.
Red Hat ONLY supports collocating Ceph services in containerized environments.
When you remove monitors from a storage cluster, consider that Ceph Monitors use the Paxos protocol to establish a consensus about the master storage cluster map. You must have a sufficient number of Ceph Monitors to establish a quorum.
Additional Resources
- See the Red Hat Ceph Storage Supported configurations Knowledgebase article for all the supported Ceph configurations.
1.2.1. Preparing a new Ceph Monitor node Copy linkLink copied to clipboard!
Before you prepare a new Ceph Monitor node for deployment, review the Requirements for Installing Red Hat Ceph Storage chapter in the Red Hat Ceph Storage Installation Guide.
Deploy each new Ceph Monitor on a separate node, and all Ceph Monitor nodes in the storage cluster must run on the same hardware.
Prerequisites
- Network connectivity.
- Root-level access to the new node.
Procedure
- Add the new node to the server rack.
- Connect the new node to the network.
Install the latest version of Red Hat Enterprise Linux 7 or Red Hat Enterprise Linux 8.
For Red Hat Enterprise Linux 7, install
ntp
and configure a reliable time source:yum install ntp
[root@mon ~]# yum install ntp
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For Red Hat Enterprise Linux 8, install
chrony
and configure a reliable time source:dnf install chrony
[root@mon ~]# dnf install chrony
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If using a firewall, open TCP port 6789:
firewall-cmd --zone=public --add-port=6789/tcp firewall-cmd --zone=public --add-port=6789/tcp --permanent
[root@mon ~]# firewall-cmd --zone=public --add-port=6789/tcp [root@mon ~]# firewall-cmd --zone=public --add-port=6789/tcp --permanent
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
-
For more information about
chrony
, refer to Red Hat Enterprise Linux 8 Configuring basic system settings.
1.2.2. Adding a Ceph Monitor using Ansible Copy linkLink copied to clipboard!
Red Hat recommends adding two Ceph Monitors at a time to maintain an odd number of monitors. For example, if you have three Ceph Monitors in the storage cluster, Red Hat recommends that you expand the number of monitors to five.
Prerequisites
- Root-level access to the new nodes.
- An Ansible administration node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
Procedure
Add the new Ceph Monitor nodes to the
/etc/ansible/hosts
Ansible inventory file, under a[mons]
section:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can contact the Ceph nodes:
ansible all -m ping
[root@admin ~]# ansible all -m ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[root@admin ~]# cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can add a Ceph Monitor using either of the following steps:
For both bare-metal and containers deployments, run the
infrastructure-playbook
:ansible-playbook -vvvv -i hosts infrastructure-playbooks/add-mon.yml
[root@admin ceph-ansible]# ansible-playbook -vvvv -i hosts infrastructure-playbooks/add-mon.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As the ansible user, run either the
site
playbook or thesite-container
playbook:Bare-metal deployments:
Example
ansible-playbook -vvvv -i hosts site.yml --limit mons
[ansible@admin ceph-ansible]$ ansible-playbook -vvvv -i hosts site.yml --limit mons
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Example
ansible-playbook -vvvv -i hosts site-container.yml --limit mons
[ansible@admin ceph-ansible]$ ansible-playbook -vvvv -i hosts site-container.yml --limit mons
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the Ansible playbook has finished running, the new Ceph Monitor nodes appear in the storage cluster.
Update the configuration file:
Bare-metal deployments:
Example
ansible-playbook -vvvv -i hosts site.yml --tags ceph_update_config
[user@admin ceph-ansible]$ ansible-playbook -vvvv -i hosts site.yml --tags ceph_update_config
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Example
ansible-playbook -vvvv -i hosts site-container.yml --tags ceph_update_config
[user@admin ceph-ansible]$ ansible-playbook -vvvv -i hosts site-container.yml --tags ceph_update_config
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2.3. Adding a Ceph Monitor using the command-line interface Copy linkLink copied to clipboard!
Red Hat recommends adding two Ceph Monitors at a time to maintain an odd number of monitors. For example, if you have three Ceph Monitors in the storage cluster, Red Hat recommends that you expand the number of monitors to five.
Red Hat recommends running only one Ceph Monitor daemon per node.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to a running Ceph Monitor node and to the new monitor nodes.
Procedure
Add the Red Hat Ceph Storage 4 monitor repository.
Red Hat Enterprise Linux 7
subscription-manager repos --enable=rhel-7-server-rhceph-4-mon-rpms
[root@mon ~]# subscription-manager repos --enable=rhel-7-server-rhceph-4-mon-rpms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
subscription-manager repos --enable=rhceph-4-mon-for-rhel-8-x86_64-rpms
[root@mon ~]# subscription-manager repos --enable=rhceph-4-mon-for-rhel-8-x86_64-rpms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the
ceph-mon
package on the new Ceph Monitor nodes:Red Hat Enterprise Linux 7
yum install ceph-mon
[root@mon ~]# yum install ceph-mon
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
dnf install ceph-mon
[root@mon ~]# dnf install ceph-mon
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
mon_host
settings list in the[mon]
section of the Ceph configuration file on a running node in the storage cluster.Add the IP address of the new Ceph Monitor node to the
mon_host
settings list:Syntax
[mon] mon_host = MONITOR_IP : PORT MONITOR_IP : PORT ... NEW_MONITOR_IP : PORT
[mon] mon_host = MONITOR_IP : PORT MONITOR_IP : PORT ... NEW_MONITOR_IP : PORT
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Instead of adding the new Ceph Monitor’s IP address to the [mon] section of the Ceph configuration file, you can create a specific section in the file for the new monitor nodes:
Syntax
[mon.MONITOR_ID] host = MONITOR_ID mon_addr = MONITOR_IP
[mon.MONITOR_ID] host = MONITOR_ID mon_addr = MONITOR_IP
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe
mon_host
settings list is a list of DNS-resolvable host names or IP addresses, separated by "," or ";" or " ". This list ensures that the storage cluster identifies the new Monitor node during a start or restart.ImportantThe
mon_initial_members
setting lists the initial quorum group of Ceph Monitor nodes. If one member of that group fails, another node in that group becomes the initial monitor node. To ensure high availability for production storage clusters, list at least three monitor nodes in themon_initial_members
andmon_host
sections of the Ceph configuration file. This prevents the storage cluster from locking up if the initial monitor node fails. If the Monitor nodes you are adding are replacing monitors that were part ofmon_initial_members
andmon_host
, add the new monitors to both sections as well.
To make the monitors part of the initial quorum group, add the host name to the
mon_initial_members
parameter in the[global]
section of the Ceph configuration file.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the updated Ceph configuration file to all Ceph nodes and Ceph clients:
Syntax
scp /etc/ceph/CLUSTER_NAME.conf TARGET_NODE_NAME:/etc/ceph
scp /etc/ceph/CLUSTER_NAME.conf TARGET_NODE_NAME:/etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf node4:/etc/ceph
[root@mon ~]# scp /etc/ceph/ceph.conf node4:/etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the monitor’s data directory on the new monitor nodes:
Syntax
mkdir /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID
mkdir /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mkdir /var/lib/ceph/mon/ceph-node4
[root@mon ~]# mkdir /var/lib/ceph/mon/ceph-node4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create temporary directories on a running Ceph Monitor node and on the new monitor nodes, and keep the files needed for this procedure in those directories. The temporary directory on each node should be different from the node’s default directory. It can be removed after all the steps are completed:
Syntax
mkdir TEMP_DIRECTORY_PATH_NAME
mkdir TEMP_DIRECTORY_PATH_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mkdir /tmp/ceph
[root@mon ~]# mkdir /tmp/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the admin key from a running Ceph Monitor node to the new Ceph Monitor nodes so that you can run
ceph
commands:Syntax
scp /etc/ceph/CLUSTER_NAME.client.admin.keyring TARGET_NODE_NAME:/etc/ceph
scp /etc/ceph/CLUSTER_NAME.client.admin.keyring TARGET_NODE_NAME:/etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph
[root@mon ~]# scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From a running Ceph Monitor node, retrieve the monitor keyring:
Syntax
ceph auth get mon. -o TEMP_DIRECTORY_PATH_NAME/KEY_FILE_NAME
ceph auth get mon. -o TEMP_DIRECTORY_PATH_NAME/KEY_FILE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth get mon. -o /tmp/ceph/ceph_keyring.out
[root@mon ~]# ceph auth get mon. -o /tmp/ceph/ceph_keyring.out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From a running Ceph Monitor node, retrieve the monitor map:
Syntax
ceph mon getmap -o TEMP_DIRECTORY_PATH_NAME/MONITOR_MAP_FILE
ceph mon getmap -o TEMP_DIRECTORY_PATH_NAME/MONITOR_MAP_FILE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon getmap -o /tmp/ceph/ceph_mon_map.out
[root@mon ~]# ceph mon getmap -o /tmp/ceph/ceph_mon_map.out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the collected Ceph Monitor data to the new Ceph Monitor nodes:
Syntax
scp /tmp/ceph TARGET_NODE_NAME:/tmp/ceph
scp /tmp/ceph TARGET_NODE_NAME:/tmp/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /tmp/ceph node4:/tmp/ceph
[root@mon ~]# scp /tmp/ceph node4:/tmp/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Prepare the data directory for the new monitors from the data you collected earlier. Specify the path to the monitor map to retrieve quorum information from the monitors, along with their `fsid`s. Specify a path to the monitor keyring:
Syntax
ceph-mon -i MONITOR_ID --mkfs --monmap TEMP_DIRECTORY_PATH_NAME/MONITOR_MAP_FILE --keyring TEMP_DIRECTORY_PATH_NAME/KEY_FILE_NAME
ceph-mon -i MONITOR_ID --mkfs --monmap TEMP_DIRECTORY_PATH_NAME/MONITOR_MAP_FILE --keyring TEMP_DIRECTORY_PATH_NAME/KEY_FILE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-mon -i node4 --mkfs --monmap /tmp/ceph/ceph_mon_map.out --keyring /tmp/ceph/ceph_keyring.out
[root@mon ~]# ceph-mon -i node4 --mkfs --monmap /tmp/ceph/ceph_mon_map.out --keyring /tmp/ceph/ceph_keyring.out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For storage clusters with custom names, add the following line to the
/etc/sysconfig/ceph
file:Syntax
echo "CLUSTER=CUSTOM_CLUSTER_NAME" >> /etc/sysconfig/ceph
echo "CLUSTER=CUSTOM_CLUSTER_NAME" >> /etc/sysconfig/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
echo "CLUSTER=example" >> /etc/sysconfig/ceph
[root@mon ~]# echo "CLUSTER=example" >> /etc/sysconfig/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the owner and group permissions on the new monitor nodes:
Syntax
chown -R OWNER : GROUP DIRECTORY_PATH
chown -R OWNER : GROUP DIRECTORY_PATH
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
chown -R ceph:ceph /var/lib/ceph/mon chown -R ceph:ceph /var/log/ceph chown -R ceph:ceph /var/run/ceph chown -R ceph:ceph /etc/ceph
[root@mon ~]# chown -R ceph:ceph /var/lib/ceph/mon [root@mon ~]# chown -R ceph:ceph /var/log/ceph [root@mon ~]# chown -R ceph:ceph /var/run/ceph [root@mon ~]# chown -R ceph:ceph /etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable and start the
ceph-mon
process on the new monitor nodes:Syntax
systemctl enable ceph-mon.target systemctl enable ceph-mon@MONITOR_ID systemctl start ceph-mon@MONITOR_ID
systemctl enable ceph-mon.target systemctl enable ceph-mon@MONITOR_ID systemctl start ceph-mon@MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl enable ceph-mon.target systemctl enable ceph-mon@node4 systemctl start ceph-mon@node4
[root@mon ~]# systemctl enable ceph-mon.target [root@mon ~]# systemctl enable ceph-mon@node4 [root@mon ~]# systemctl start ceph-mon@node4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Enabling the Red Hat Ceph Storage Repositories section in the Red Hat Ceph Storage Installation Guide.
1.2.4. Configuring monitor election strategy Copy linkLink copied to clipboard!
The monitor election strategy identifies the net splits and handles failures. You can configure the election monitor strategy in three different modes:
-
classic
- This is the default mode in which the lowest ranked monitor is voted based on the elector module between the two sites. -
disallow
- This mode lets you mark monitors as disallowed, in which case they will participate in the quorum and serve clients, but cannot be an elected leader. This lets you add monitors to a list of disallowed leaders. If a monitor is in the disallowed list, it will always defer to another monitor. -
connectivity
- This mode is mainly used to resolve network discrepancies. It evaluates connection scores provided by each monitor for its peers and elects the most connected and reliable monitor to be the leader. This mode is designed to handle net splits, which may happen if your cluster is stretched across multiple data centers or otherwise susceptible. This mode incorporates connection score ratings and elects the monitor with the best score.
Red Hat recommends you to stay in the classic
mode unless you require features in the other modes.
Before constructing the cluster, change the election_strategy
to classic
, disallow
, or connectivity
in the following command:
Syntax
ceph mon set election_strategy {classic|disallow|connectivity}
ceph mon set election_strategy {classic|disallow|connectivity}
1.2.5. Removing a Ceph Monitor using Ansible Copy linkLink copied to clipboard!
To remove a Ceph Monitor with Ansible, use the shrink-mon.yml
playbook.
Prerequisites
- An Ansible administration node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
Procedure
Check if the monitor is
ok-to-stop
:Syntax
ceph mon ok-to-stop MONITOR_ID
ceph mon ok-to-stop MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon ok-to-stop node03
[root@mon ~]# ceph mon ok-to-stop node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
/usr/share/ceph-ansible/
directory.cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For bare-metal and containers deployments, run the
shrink-mon.yml
Ansible playbook:Syntax
ansible-playbook infrastructure-playbooks/shrink-mon.yml -e mon_to_kill=NODE_NAME -u ANSIBLE_USER_NAME -i hosts
ansible-playbook infrastructure-playbooks/shrink-mon.yml -e mon_to_kill=NODE_NAME -u ANSIBLE_USER_NAME -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
NODE_NAME
with the short host name of the Ceph Monitor node. You can remove only one Ceph Monitor each time the playbook runs. -
ANSIBLE_USER_NAME
with the name of the Ansible user
Example
ansible-playbook infrastructure-playbooks/shrink-mon.yml -e mon_to_kill=node03 -u user -i hosts
[user@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mon.yml -e mon_to_kill=node03 -u user -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
-
Manually remove the corresponding entry from the ansible inventory host
/etc/ansible/hosts
. Run the
ceph-ansible
playbook.Bare-metal deployments:
Example
ansible-playbook site.yml --tags ceph_update_config -i hosts
[user@admin ceph-ansible]$ ansible-playbook site.yml --tags ceph_update_config -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Example
ansible-playbook site-container.yml --tags ceph_update_config -i hosts
[user@admin ceph-ansible]$ ansible-playbook site-container.yml --tags ceph_update_config -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Ensure that the Ceph Monitor has been successfully removed:
ceph -s
[root@mon ~]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- For more information on installing Red Hat Ceph Storage, see the Red Hat Ceph Storage Installation Guide.
- See the Configuring Ansible’s inventory location section in the {storage_product} Installation Guide for more details on the Ansible inventory configuration.
1.2.6. Removing a Ceph Monitor using the command-line interface Copy linkLink copied to clipboard!
Removing a Ceph Monitor involves removing a ceph-mon
daemon from the storage cluster and updating the storage cluster map.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the monitor node.
Procedure
Check if the monitor is
ok-to-stop
:Syntax
ceph mon ok-to-stop HOSTNAME
ceph mon ok-to-stop HOSTNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon ok-to-stop node03
[root@mon ~]# ceph mon ok-to-stop node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the Ceph Monitor service:
Syntax
systemctl stop ceph-mon@MONITOR_ID
systemctl stop ceph-mon@MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-mon@node3
[root@mon ~]# systemctl stop ceph-mon@node3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the Ceph Monitor from the storage cluster:
Syntax
ceph mon remove MONITOR_ID
ceph mon remove MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mon remove node3
[root@mon ~]# ceph mon remove node3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Remove the Ceph Monitor entry from the Ceph configuration file. The default location for the configuration file is
/etc/ceph/ceph.conf
. Redistribute the Ceph configuration file to all remaining Ceph nodes in the storage cluster:
Syntax
scp /etc/ceph/CLUSTER_NAME.conf USER_NAME @ TARGET_NODE_NAME :/etc/ceph/
scp /etc/ceph/CLUSTER_NAME.conf USER_NAME @ TARGET_NODE_NAME :/etc/ceph/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf root@node3:/etc/ceph/
[root@mon ~]# scp /etc/ceph/ceph.conf root@node3:/etc/ceph/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For a Containers deployment, disable and remove the Ceph Monitor service:
Disable the Ceph Monitor service:
Syntax
systemctl disable ceph-mon@MONITOR_ID
systemctl disable ceph-mon@MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl disable ceph-mon@node3
[root@mon ~]# systemctl disable ceph-mon@node3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the service from
systemd
:rm /etc/systemd/system/ceph-mon@.service
[root@mon ~]# rm /etc/systemd/system/ceph-mon@.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Reload the
systemd
manager configuration:systemctl daemon-reload
[root@mon ~]# systemctl daemon-reload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Reset the state of the failed Ceph Monitor node:
systemctl reset-failed
[root@mon ~]# systemctl reset-failed
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: Archive the Ceph Monitor data:
Syntax
mv /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID /var/lib/ceph/mon/removed- CLUSTER_NAME - MONITOR_ID
mv /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID /var/lib/ceph/mon/removed- CLUSTER_NAME - MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mv /var/lib/ceph/mon/ceph-node3 /var/lib/ceph/mon/removed-ceph-node3
[root@mon ~]# mv /var/lib/ceph/mon/ceph-node3 /var/lib/ceph/mon/removed-ceph-node3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Delete the Ceph Monitor data:
Syntax
rm -r /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID
rm -r /var/lib/ceph/mon/CLUSTER_NAME - MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
rm -r /var/lib/ceph/mon/ceph-node3
[root@mon ~]# rm -r /var/lib/ceph/mon/ceph-node3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2.7. Removing a Ceph Monitor from an unhealthy storage cluster Copy linkLink copied to clipboard!
You can remove a ceph-mon
daemon from an unhealthy storage cluster that has placement groups persistently not in active + clean
state.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph Monitor node.
- At least one running Ceph Monitor node.
Procedure
Log into a surviving Ceph Monitor node:
Syntax
ssh root@MONITOR_NODE_NAME
ssh root@MONITOR_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ssh root@mon2
[root@admin ~]# ssh root@mon2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the
ceph-mon
daemon and extract a copy of themonmap
file. :Syntax
systemctl stop ceph-mon@MONITOR_ID ceph-mon -i SHORT_HOSTNAME --extract-monmap TEMP_PATH
systemctl stop ceph-mon@MONITOR_ID ceph-mon -i SHORT_HOSTNAME --extract-monmap TEMP_PATH
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-mon@mon1 ceph-mon -i mon1 --extract-monmap /tmp/monmap
[root@mon2 ~]# systemctl stop ceph-mon@mon1 [root@mon2 ~]# ceph-mon -i mon1 --extract-monmap /tmp/monmap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the non-surviving Ceph Monitor(s):
Syntax
monmaptool TEMPORARY_PATH --rm _MONITOR_ID
monmaptool TEMPORARY_PATH --rm _MONITOR_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
monmaptool /tmp/monmap --rm mon1
[root@mon2 ~]# monmaptool /tmp/monmap --rm mon1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inject the surviving monitor map with the removed monitor(s) into the surviving Ceph Monitor:
Syntax
ceph-mon -i SHORT_HOSTNAME --inject-monmap TEMP_PATH
ceph-mon -i SHORT_HOSTNAME --inject-monmap TEMP_PATH
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-mon -i mon2 --inject-monmap /tmp/monmap
[root@mon2 ~]# ceph-mon -i mon2 --inject-monmap /tmp/monmap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start only the surviving monitors, and verify that the monitors form a quorum:
Example
ceph -s
[root@mon2 ~]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Optional: Archive the removed Ceph Monitor’s data directory in
/var/lib/ceph/mon
directory.
1.3. Ceph Managers Copy linkLink copied to clipboard!
The Ceph Manager daemon (ceph-mgr
) runs alongside monitor daemons, to provide additional monitoring and interfaces to external monitoring and management systems. The ceph-mgr
daemon is required for normal operations. By default, the Ceph Manager daemon requires no additional configuration, beyond ensuring it is running. If there is no mgr daemon running, you can see a health warning to that effect, and some of the other information in the output of ceph status
is missing or stale until a Ceph Manager is started.
1.3.1. Adding a Ceph Manager using Ansible Copy linkLink copied to clipboard!
Usually, the Ansible automation utility installs the Ceph Manager daemon (ceph-mgr
) when you deploy the Red Hat Ceph Storage cluster. If the Ceph Manager service or the daemon is down, you can redeploy the ceph-mgr
daemon using Ansible. You can remove the manager daemon and add a new or an existing node to deploy the Ceph Manager daemon. Red Hat recommends colocating the Ceph Manager and Ceph Monitor daemons on the same node.
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node. - New or existing nodes to deploy the Ceph Manager daemons.
Procedure
- Log in to the Ansible administration node.
Navigate to the
/usr/share/ceph-ansible/
directory:Example
cd /usr/share/ceph-ansible/
[ansible@admin ~]$ cd /usr/share/ceph-ansible/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and add the Ceph Manager node under the[mgrs]
section:Syntax
[mgrs] CEPH_MANAGER_NODE_NAME CEPH_MANAGER_NODE_NAME
[mgrs] CEPH_MANAGER_NODE_NAME CEPH_MANAGER_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace CEPH_MANAGER_NODE_NAME with the host name of the node where you want to install the Ceph Manager daemon.
As the
ansible
user, run the Ansible playbook:Bare-metal deployments:
ansible-playbook site.yml --limit mgrs -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mgrs -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
ansible-playbook site-container.yml --limit mgrs -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mgrs -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the Ansible playbook has finished running, the new Ceph Manager daemons node appears in the storage cluster.
Verification
On the monitor node, check the status of the storage cluster:
Syntax
ceph -s
ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph -s
[root@ceph-2 ~]# ceph -s mgr: ceph-3(active, since 2h), standbys: ceph-1, ceph-2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.3.2. Removing a Ceph Manager using Ansible Copy linkLink copied to clipboard!
You can use shrink-mgr
playbook to remove the Ceph Manager daemons. This playbook removes a Ceph manager from your cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node. - Admin access to the Ansible administration node.
Procedure
- As an admin user, log in to the Ansible administration node.
Navigate to the
/usr/share/ceph-ansible/
directory.cd /usr/share/ceph-ansible/
[admin@admin ~]$ cd /usr/share/ceph-ansible/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For bare-metal and containers deployments, run the
shrink-mgr.yml
Ansible playbook:Syntax
ansible-playbook infrastructure-playbooks/shrink-mgr.yml -e mgr_to_kill=NODE_NAME -u ANSIBLE_USER_NAME -i hosts
ansible-playbook infrastructure-playbooks/shrink-mgr.yml -e mgr_to_kill=NODE_NAME -u ANSIBLE_USER_NAME -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
NODE_NAME
with the short host name of the Ceph Manager node. You can remove only one Ceph Manager each time the playbook runs. -
ANSIBLE_USER_NAME
with the name of the Ansible user
Example
ansible-playbook infrastructure-playbooks/shrink-mgr.yml -e mgr_to_kill=ceph-2 -u admin -i hosts
[admin@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mgr.yml -e mgr_to_kill=ceph-2 -u admin -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
As a
root
user, edit the/usr/share/ceph-ansible/hosts
inventory file and remove the Ceph Manager node under the[mgrs]
section:Syntax
[mgrs] CEPH_MANAGER_NODE_NAME CEPH_MANAGER_NODE_NAME
[mgrs] CEPH_MANAGER_NODE_NAME CEPH_MANAGER_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[mgrs] ceph-1 ceph-3
[mgrs] ceph-1 ceph-3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
ceph-2
was removed from the[mgrs]
list.
Verification
On the monitor node, check the status of the storage cluster:
Syntax
ceph -s
ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph -s
[root@ceph-2 ~]# ceph -s mgr: ceph-3(active, since 112s), standbys: ceph-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.4. Ceph MDSs Copy linkLink copied to clipboard!
The Ceph Metadata Server (MDS) node runs the MDS daemon (ceph-mds), which manages metadata related to files stored on the Ceph File System (CephFS). The MDS provides POSIX-compliant, shared file-system metadata management, including ownership, time stamps, and mode. The MDS uses RADOS (Reliable Autonomic Distributed Object Storage) to store metadata.
The MDS enables CephFS to interact with the Ceph Object Store, mapping an inode to an object and the location where Ceph stores the data within a tree. Clients accessing a CephFS file system first make a request to an MDS, which provides the information needed to get file content from the correct OSDs.
1.4.1. Adding a Ceph MDS using Ansible Copy linkLink copied to clipboard!
Use the Ansible playbook to add a Ceph Metadata Server (MDS).
Prerequisites
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node. - New or existing servers that can be provisioned as MDS nodes.
Procedure
- Log in to the Ansible administration node
Change to the
/usr/share/ceph-ansible
directory:Example
cd /usr/share/ceph-ansible
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and add the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAME
[mdss] MDS_NODE_NAME NEW_MDS_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.
Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the
[osds]
and[mdss]
sections.Example
[mdss] node01 node03
[mdss] node01 node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As the
ansible
user, run the Ansible playbook to provision the MDS node:Bare-metal deployments:
ansible-playbook site.yml --limit mdss -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
ansible-playbook site-container.yml --limit mdss -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
ceph mds stat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mds stat
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.4.2. Adding a Ceph MDS using the command-line interface Copy linkLink copied to clipboard!
You can manually add a Ceph Metadata Server (MDS) using the command-line interface.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes. - New or existing servers that can be provisioned as MDS nodes.
Procedure
Add a new MDS node by logging into the node and creating an MDS mount point:
Syntax
sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID
sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.
Example
sudo mkdir /var/lib/ceph/mds/ceph-node03
[admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If this is a new MDS node, create the authentication key if you are using Cephx authentication:
Syntax
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring
sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.
Example
sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring
[admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteCephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.
Start the MDS daemon:
Syntax
sudo systemctl start ceph-mds@HOST_NAME
sudo systemctl start ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to start the daemon.
Example
sudo systemctl start ceph-mds@node03
[admin@node03 ~]$ sudo systemctl start ceph-mds@node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the MDS service:
Syntax
systemctl enable ceph-mds@HOST_NAME
systemctl enable ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to enable the service.
Example
sudo systemctl enable ceph-mds@node03
[admin@node03 ~]$ sudo systemctl enable ceph-mds@node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can use the
ceph mds stat
command to check if the MDS is in an active state:Syntax
ceph mds stat
ceph mds stat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph mds stat
[ansible@admin ceph-ansible]$ ceph mds stat cephfs:1 {0=node01=up:active} 1 up:standby
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.4.3. Removing a Ceph MDS using Ansible Copy linkLink copied to clipboard!
To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds
playbook.
If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.
Prerequisites
- At least one MDS node.
- A running Red Hat Ceph Storage cluster deployed by Ansible.
-
Root
orsudo
access to an Ansible administration node.
Procedure
- Log in to the Ansible administration node.
Change to the
/usr/share/ceph-ansible
directory:Example
cd /usr/share/ceph-ansible
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible
shrink-mds.yml
playbook, and when prompted, typeyes
to confirm shrinking the cluster:Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.
Example
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts
[ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As
root
or withsudo
access, open and edit the/usr/share/ceph-ansible/hosts
inventory file and remove the MDS node under the[mdss]
section:Syntax
[mdss] MDS_NODE_NAME MDS_NODE_NAME
[mdss] MDS_NODE_NAME MDS_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[mdss] node01 node03
[mdss] node01 node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
node02
was removed from the[mdss]
list.
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.4.4. Removing a Ceph MDS using the command-line interface Copy linkLink copied to clipboard!
You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.
If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.
Prerequisites
-
The
ceph-common
package is installed. - A running Red Hat Ceph Storage cluster.
-
Root
orsudo
access to the MDS nodes.
Procedure
- Log into the Ceph MDS node that you want to remove the MDS daemon from.
Stop the Ceph MDS service:
Syntax
sudo systemctl stop ceph-mds@HOST_NAME
sudo systemctl stop ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host where the daemon is running.
Example
sudo systemctl stop ceph-mds@node02
[admin@node02 ~]$ sudo systemctl stop ceph-mds@node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Disable the MDS service if you are not redeploying MDS to this node:
Syntax
sudo systemctl disable ceph-mds@HOST_NAME
sudo systemctl disable ceph-mds@HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace HOST_NAME with the short name of the host to disable the daemon.
Example
sudo systemctl disable ceph-mds@node02
[admin@node02 ~]$ sudo systemctl disable ceph-mds@node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
/var/lib/ceph/mds/ceph-MDS_ID
directory on the MDS node:Syntax
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID
sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.
Example
sudo rm -fr /var/lib/ceph/mds/ceph-node02
[admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the MDS daemons:
Syntax
ceph fs dump
ceph fs dump
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.5. Ceph OSDs Copy linkLink copied to clipboard!
When a Red Hat Ceph Storage cluster is up and running, you can add OSDs to the storage cluster at runtime.
A Ceph OSD generally consists of one ceph-osd
daemon for one storage drive and its associated journal within a node. If a node has multiple storage drives, then map one ceph-osd
daemon for each drive.
Red Hat recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a storage cluster reaches its near full
ratio, add one or more OSDs to expand the storage cluster’s capacity.
When you want to reduce the size of a Red Hat Ceph Storage cluster or replace the hardware, you can also remove an OSD at runtime. If the node has multiple storage drives, you might also need to remove one of the ceph-osd
daemon for that drive. Generally, it’s a good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you remove an OSD that the storage cluster is not at its near full
ratio.
Do not let a storage cluster reach the full
ratio before adding an OSD. OSD failures that occur after the storage cluster reaches the near full
ratio can cause the storage cluster to exceed the full
ratio. Ceph blocks write access to protect the data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full
ratio first.
1.5.1. Ceph OSD node configuration Copy linkLink copied to clipboard!
Configure Ceph OSDs and their supporting hardware similarly as a storage strategy for the pool(s) that will use the OSDs. Ceph prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with drives of the same type or size.
If you add drives of dissimilar size, adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than older nodes in the storage cluster, that is, they might have a greater weight.
Before doing a new installation, review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide.
Additional Resources
- See the Red Hat Ceph Storage Storage Strategies Guide for more details. *
1.5.2. Mapping a container OSD ID to a drive Copy linkLink copied to clipboard!
Sometimes, it is necessary to identify which drive a containerized OSD is using. For example, if an OSD has an issue you might need to know which drive it uses to verify the drive status. Also, for a non-containerized OSD you reference the OSD ID to start and stop it, but to start and stop a containerized OSD you reference the drive it uses.
The examples below are running on Red Hat Enterprise Linux 8. In Red Hat Enterprise Linux 8, podman
is the default service and has replaced the older docker
service. If you are running on Red Hat Enterprise Linux 7, then substitute podman
with docker
to execute the commands given.
Prerequisites
- A running Red Hat Ceph Storage cluster in a containerized environment.
-
Having
root
access to the container node.
Procedure
Find a container name. For example, to identify the drive associated with
osd.5
, open a terminal on the container node whereosd.5
is running, and then runpodman ps
to list all containers:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use
podman exec
to runceph-volume lvm list
on any OSD container name from the previous output:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From this output you can see
osd.5
is associated with/dev/sdb
.
Additional Resources
- See Replacing a failed OSD disk for more information.
1.5.3. Adding a Ceph OSD using Ansible with the same disk topology Copy linkLink copied to clipboard!
For Ceph OSDs with the same disk topology, Ansible adds the same number of OSDs as other OSD nodes using the same device paths specified in the devices:
section of the /usr/share/ceph-ansible/group_vars/osds.yml
file.
The new Ceph OSD nodes have the same configuration as the rest of the OSDs.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Red Hat Ceph Storage Installation Guide.
-
Having
root
access to the new nodes. - The same number of OSD data drives as other OSD nodes in the storage cluster.
Procedure
Add the Ceph OSD node(s) to the
/etc/ansible/hosts
file, under the[osds]
section:Syntax
[osds] ... osd06 NEW_OSD_NODE_NAME
[osds] ... osd06 NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach the Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For bare-metal and containers deployments, run the
add-osd.yml
Ansible playbook:NoteFor a new OSD host, you need to run either the
site.yml
or thesite-container.yml
playbook with the--limit
option asnode-exporter
andceph-crash
services are not deployed on the node withosds.yml
playbook.Example
ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
[user@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For new OSD host, run the
site.yml
orsite-container.yml
Ansible playbook:Bare-metal deployments:
Syntax
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Syntax
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site-container.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site-container.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
When adding an OSD, if the playbook fails with PGs were not reported as active+clean
, configure the following variables in the all.yml
file to adjust the retries and delay:
OSD handler checks
# OSD handler checks
handler_health_osd_check_retries: 50
handler_health_osd_check_delay: 30
Additional Resources
- See the Configuring Ansible’s inventory location section in the {storage_product} Installation Guide for more details on the Ansible inventory configuration.
1.5.4. Adding a Ceph OSD using Ansible with different disk topologies Copy linkLink copied to clipboard!
For Ceph OSDs with different disk topologies, there are two approaches for adding the new OSD node(s) to an existing storage cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Red Hat Ceph Storage Installation Guide.
-
Having
root
access to the new nodes.
Procedure
First Approach
Add the new Ceph OSD node(s) to the
/etc/ansible/hosts
file, under the[osds]
section:Example
[osds] ... osd06 NEW_OSD_NODE_NAME
[osds] ... osd06 NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a new file for each new Ceph OSD node added to the storage cluster, under the
/etc/ansible/host_vars/
directory:Syntax
touch /etc/ansible/host_vars/NEW_OSD_NODE_NAME
touch /etc/ansible/host_vars/NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
touch /etc/ansible/host_vars/osd07
[root@admin ~]# touch /etc/ansible/host_vars/osd07
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the new file, and add the
devices:
anddedicated_devices:
sections to the file. Under each of these sections add a-
, space, then the full path to the block device names for this OSD node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach all the Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For bare-metal and containers deployments, run the
add-osd.yml
Ansible playbook:NoteFor a new OSD host, you need to run either the
site.yml
or thesite-container.yml
playbook with the--limit
option asnode-exporter
andceph-crash
services are not deployed on the node withosds.yml
playbook.Example
ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
[user@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For new OSD host, run the
site.yml
orsite-container.yml
Ansible playbook:Bare-metal deployments:
Syntax
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Syntax
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site-container.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site-container.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Second Approach
Add the new OSD node name to the
/etc/ansible/hosts
file, and use thedevices
anddedicated_devices
options, specifying the different disk topology:Example
[osds] ... osd07 devices="['/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf']" dedicated_devices="['/dev/sda', '/dev/sda', '/dev/sdb', '/dev/sdb']"
[osds] ... osd07 devices="['/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf']" dedicated_devices="['/dev/sda', '/dev/sda', '/dev/sdb', '/dev/sdb']"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that Ansible can reach the all Ceph nodes:
ansible all -m ping
[user@admin ~]$ ansible all -m ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change directory to the Ansible configuration directory:
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
For bare-metal and containers deployments, run the
add-osd.yml
Ansible playbook:NoteFor a new OSD host, you need to run either the
site.yml
or thesite-container.yml
playbook with the--limit
option asnode-exporter
andceph-crash
services are not deployed on the node withosds.yml
playbook.Example
ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
[user@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/add-osd.yml -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For new OSD host, run the
site.yml
orsite-container.yml
Ansible playbook:Bare-metal deployments:
Syntax
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Container deployments:
Syntax
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook site-container.yml -i hosts --limit node03
[user@admin ceph-ansible]$ ansible-playbook site-container.yml -i hosts --limit node03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Configuring Ansible’s inventory location section in the {storage_product} Installation Guide for more details on the Ansible inventory configuration.
1.5.5. Creating Ceph OSDs using ceph-volume Copy linkLink copied to clipboard!
The create
subcommand calls the prepare
subcommand, and then calls the activate
subcommand.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph OSD nodes.
If you prefer to have more control over the creation process, you can use the prepare
and activate
subcommands separately to create the OSD, instead of using create
. You can use the two subcommands to gradually introduce new OSDs into a storage cluster, while avoiding having to rebalance large amounts of data. Both approaches work the same way, except that using the create
subcommand causes the OSD to become up and in immediately after completion.
Procedure
To create a new OSD:
Syntax
ceph-volume lvm create --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME
ceph-volume lvm create --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm create --bluestore --data example_vg/data_lv
[root@osd ~]# ceph-volume lvm create --bluestore --data example_vg/data_lv
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Preparing Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Administration Guide for more details.
- See the Activating Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Administration Guide for more details.
1.5.6. Using batch mode with ceph-volume Copy linkLink copied to clipboard!
The batch
subcommand automates the creation of multiple OSDs when single devices are provided.
The ceph-volume
command decides the best method to use to create the OSDs, based on drive type. Ceph OSD optimization depends on the available devices:
-
If all devices are traditional hard drives,
batch
creates one OSD per device. -
If all devices are solid state drives,
batch
creates two OSDs per device. -
If there is a mix of traditional hard drives and solid state drives,
batch
uses the traditional hard drives for data, and creates the largest possible journal (block.db
) on the solid state drive.
The batch
subcommand does not support the creation of a separate logical volume for the write-ahead-log (block.wal
) device.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph OSD nodes.
Procedure
To create OSDs on several drives:
Syntax
ceph-volume lvm batch --bluestore PATH_TO_DEVICE [PATH_TO_DEVICE]
ceph-volume lvm batch --bluestore PATH_TO_DEVICE [PATH_TO_DEVICE]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1
[root@osd ~]# ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Creating Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Administration Guide for more details.
1.5.7. Adding a Ceph OSD using the command-line interface Copy linkLink copied to clipboard!
Here is the high-level workflow for manually adding an OSD to a Red Hat Ceph Storage:
-
Install the
ceph-osd
package and create a new OSD instance. - Prepare and mount the OSD data and journal drives.
- Create volume groups and logical volumes.
- Add the new OSD node to the CRUSH map.
- Update the owner and group permissions.
-
Enable and start the
ceph-osd
daemon.
The ceph-disk
command is deprecated. The ceph-volume
command is now the preferred method for deploying OSDs from the command-line interface. Currently, the ceph-volume
command only supports the lvm
plugin. Red Hat will provide examples throughout this guide using both commands as a reference, allowing time for storage administrators to convert any custom scripts that rely on ceph-disk
to ceph-volume
instead.
For custom storage cluster names, use the --cluster CLUSTER_NAME
option with the ceph
and ceph-osd
commands.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Red Hat Ceph Storage Installation Guide.
-
The
root
access to the new nodes. -
Optional. If you do not want the
ceph-volume
utility to create a volume group and logical volumes automatically, create them manually. See the Configuring and managing logical volumes guide for Red Hat Enterprise Linux 8.
Procedure
Enable the Red Hat Ceph Storage 4 OSD software repository.
Red Hat Enterprise Linux 7
subscription-manager repos --enable=rhel-7-server-rhceph-4-osd-rpms
[root@osd ~]# subscription-manager repos --enable=rhel-7-server-rhceph-4-osd-rpms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
subscription-manager repos --enable=rhceph-4-osd-for-rhel-8-x86_64-rpms
[root@osd ~]# subscription-manager repos --enable=rhceph-4-osd-for-rhel-8-x86_64-rpms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
/etc/ceph/
directory:mkdir /etc/ceph
[root@osd ~]# mkdir /etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the new OSD node, copy the Ceph administration keyring and configuration files from one of the Ceph Monitor nodes:
Syntax
scp USER_NAME @ MONITOR_HOST_NAME :/etc/ceph/CLUSTER_NAME.client.admin.keyring /etc/ceph scp USER_NAME @ MONITOR_HOST_NAME :/etc/ceph/CLUSTER_NAME.conf /etc/ceph
scp USER_NAME @ MONITOR_HOST_NAME :/etc/ceph/CLUSTER_NAME.client.admin.keyring /etc/ceph scp USER_NAME @ MONITOR_HOST_NAME :/etc/ceph/CLUSTER_NAME.conf /etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp root@node1:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ scp root@node1:/etc/ceph/ceph.conf /etc/ceph/
[root@osd ~]# scp root@node1:/etc/ceph/ceph.client.admin.keyring /etc/ceph/ [root@osd ~]# scp root@node1:/etc/ceph/ceph.conf /etc/ceph/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the
ceph-osd
package on the new Ceph OSD node:Red Hat Enterprise Linux 7
yum install ceph-osd
[root@osd ~]# yum install ceph-osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
dnf install ceph-osd
[root@osd ~]# dnf install ceph-osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Prepare the OSDs.
To use previously created logical volumes:
Syntax
ceph-volume lvm prepare --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME
ceph-volume lvm prepare --bluestore --data VOLUME_GROUP/LOGICAL_VOLUME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify a raw device for
ceph-volume
to create logical volumes automatically:Syntax
ceph-volume lvm prepare --bluestore --data /PATH_TO_DEVICE
ceph-volume lvm prepare --bluestore --data /PATH_TO_DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow See the Preparing OSDs section for more details.
Set the
noup
option:ceph osd set noup
[root@osd ~]# ceph osd set noup
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Activate the new OSD:
Syntax
ceph-volume lvm activate --bluestore OSD_ID OSD_FSID
ceph-volume lvm activate --bluestore OSD_ID OSD_FSID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm activate --bluestore 4 6cc43680-4f6e-4feb-92ff-9c7ba204120e
[root@osd ~]# ceph-volume lvm activate --bluestore 4 6cc43680-4f6e-4feb-92ff-9c7ba204120e
Copy to Clipboard Copied! Toggle word wrap Toggle overflow See the Activating OSDs section for more details.
NoteYou can prepare and activate OSDs with a single command. See the Creating OSDs section for details. Alternatively, you can specify multiple drives and create OSDs with a single command. See the Using
batch
mode.Add the OSD to the CRUSH map. If you specify more than one bucket, the command places the OSD into the most specific bucket out of those you specified, and it moves the bucket underneath any other buckets you specified.
Syntax
ceph osd crush add OSD_ID WEIGHT [ BUCKET_TYPE = BUCKET_NAME ...]
ceph osd crush add OSD_ID WEIGHT [ BUCKET_TYPE = BUCKET_NAME ...]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush add 4 1 host=node4
[root@osd ~]# ceph osd crush add 4 1 host=node4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you specify more than one bucket, the command places the OSD into the most specific bucket out of those you specified, and it moves the bucket underneath any other buckets you specified.
NoteYou can also edit the CRUSH map manually. See the Editing a CRUSH map section in the Red Hat Ceph Storage Storage Strategies Guide.
ImportantIf you specify only the root bucket, then the OSD attaches directly to the root, but the CRUSH rules expect OSDs to be inside of the host bucket.
Unset the
noup
option:ceph osd unset noup
[root@osd ~]# ceph osd unset noup
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the owner and group permissions for the newly created directories:
Syntax
chown -R OWNER : GROUP PATH_TO_DIRECTORY
chown -R OWNER : GROUP PATH_TO_DIRECTORY
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
chown -R ceph:ceph /var/lib/ceph/osd chown -R ceph:ceph /var/log/ceph chown -R ceph:ceph /var/run/ceph chown -R ceph:ceph /etc/ceph
[root@osd ~]# chown -R ceph:ceph /var/lib/ceph/osd [root@osd ~]# chown -R ceph:ceph /var/log/ceph [root@osd ~]# chown -R ceph:ceph /var/run/ceph [root@osd ~]# chown -R ceph:ceph /etc/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use storage clusters with custom names, then add the following line to the appropriate file:
echo "CLUSTER=CLUSTER_NAME" >> /etc/sysconfig/ceph
[root@osd ~]# echo "CLUSTER=CLUSTER_NAME" >> /etc/sysconfig/ceph
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
CLUSTER_NAME
with the custom storage cluster name.To ensure that the new OSD is
up
and ready to receive data, enable and start the OSD service:Syntax
systemctl enable ceph-osd@OSD_ID systemctl start ceph-osd@OSD_ID
systemctl enable ceph-osd@OSD_ID systemctl start ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl enable ceph-osd@4 systemctl start ceph-osd@4
[root@osd ~]# systemctl enable ceph-osd@4 [root@osd ~]# systemctl start ceph-osd@4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Editing a CRUSH map section in the Red Hat Ceph Storage Storage Strategies Guide for more information.
-
See the Red Hat Ceph Storage Administration Guide, for more information on using the
ceph-volume
command.
1.5.8. Adding a Ceph OSD using the command-line interface in a containerized environment Copy linkLink copied to clipboard!
You can manually add a single or multiple Ceph OSD using the command-line interface in a containerized Red Hat Ceph Storage cluster.
Red Hat recommends the use of ceph-ansible
to add Ceph OSDs unless there is an exception or a specific use case where adding Ceph OSDs manually is required. If you are not sure, contact Red Hat Support.
Prerequisites
- A running Red Hat Ceph Storage cluster in a containerized environment.
- Having root access to the container node.
- An existing OSD node.
The examples below are running on Red Hat Enterprise Linux 8. In Red Hat Enterprise Linux 8, podman is the default service and has replaced the older docker service. If you are running on Red Hat Enterprise Linux 7, then substitute podman with docker to execute the commands given.
Procedure
To create a single OSD, execute the
lvm prepare
command:Syntax
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume PATH_TO_IMAGE --cluster CLUSTER_NAME lvm prepare --bluestore --data PATH_TO_DEVICE --no-systemd
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume PATH_TO_IMAGE --cluster CLUSTER_NAME lvm prepare --bluestore --data PATH_TO_DEVICE --no-systemd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume registry.redhat.io/rhceph/rhceph-4-rhel8:latest --cluster ceph lvm prepare --bluestore --data /dev/sdh --no-systemd
[root@osd ~]# podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume registry.redhat.io/rhceph/rhceph-4-rhel8:latest --cluster ceph lvm prepare --bluestore --data /dev/sdh --no-systemd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The example prepares a single Bluestore Ceph OSD with data on
/dev/sdh
.NoteTo enable and start the OSD, execute the following commands:
Example
systemctl enable ceph-osd@4 systemctl start ceph-osd@4
[root@osd ~]# systemctl enable ceph-osd@4 [root@osd ~]# systemctl start ceph-osd@4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can also use the following optional arguments:
- dmcrypt
- Description
- Enable encryption for the underlying OSD devices.
- block.db
- Description
- Path to a bluestore block.db logical volume or partition.
- block.wal
- Description
- Path to a bluestore block.wal logical volume or partition.
To create multiple Ceph OSDs, execute the
lvm batch
command:Syntax
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume PATH_TO_IMAGE --cluster CLUSTER_NAME lvm batch --bluestore --yes --prepare _PATH_TO_DEVICE PATH_TO_DEVICE --no-systemd
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume PATH_TO_IMAGE --cluster CLUSTER_NAME lvm batch --bluestore --yes --prepare _PATH_TO_DEVICE PATH_TO_DEVICE --no-systemd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume registry.redhat.io/rhceph/rhceph-4-rhel8:latest --cluster ceph lvm batch --bluestore --yes --prepare /dev/sde /dev/sdf --no-systemd
[root@osd ~]# podman run --rm --net=host --privileged=true --pid=host --ipc=host -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /var/lib/ceph:/var/lib/ceph:z -v /etc/ceph:/etc/ceph:z -v /var/run/ceph:/var/run/ceph:z -v /var/run/udev/:/var/run/udev/ -v /var/log/ceph:/var/log/ceph:z -v /run/lvm/:/run/lvm/ --entrypoint=ceph-volume registry.redhat.io/rhceph/rhceph-4-rhel8:latest --cluster ceph lvm batch --bluestore --yes --prepare /dev/sde /dev/sdf --no-systemd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The example prepares multiple Bluestore Ceph OSDs with data on
/dev/sde
and/dev/sdf
.You can also use the following optional arguments:
- dmcrypt
- Description
- Enable encryption for the underlying OSD devices.
- db-devices
- Description
- Path to a bluestore block.db logical volume or partition.
- wal-devices
- Description
- Path to a bluestore block.wal logical volume or partition.
1.5.9. Removing a Ceph OSD using Ansible Copy linkLink copied to clipboard!
At times, you might need to scale down the capacity of a Red Hat Ceph Storage cluster. To remove an OSD from a Red Hat Ceph Storage cluster using Ansible, run the shrink-osd.yml
playbook.
Removing an OSD from the storage cluster will destroy all the data contained on that OSD.
Before removing OSDs, verify that the cluster has enough space to re-balance.
Do not remove OSDs simultaneously unless you are sure the placement groups are in an active+clean
state and the OSDs do not contain replicas or erasure coding shards for the same objects. If unsure, contact Red Hat Support.
Prerequisites
- A running Red Hat Ceph Storage deployed by Ansible.
- A running Ansible administration node.
Procedure
Change to the
/usr/share/ceph-ansible/
directory.Syntax
cd /usr/share/ceph-ansible
[user@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Copy the admin keyring from
/etc/ceph/
on the Ceph Monitor node to the node that contains the OSD that you want to remove. Run the Ansible playbook for either normal or containerized deployments of Ceph:
Syntax
ansible-playbook infrastructure-playbooks/shrink-osd.yml -e osd_to_kill=ID -u ANSIBLE_USER_NAME -i hosts
ansible-playbook infrastructure-playbooks/shrink-osd.yml -e osd_to_kill=ID -u ANSIBLE_USER_NAME -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
ID
with the ID of the OSD node. To remove multiple OSDs, separate the OSD IDs with a comma. -
ANSIBLE_USER_NAME
with the name of the Ansible user.
Example
ansible-playbook infrastructure-playbooks/shrink-osd.yml -e osd_to_kill=1 -u user -i hosts
[user@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-osd.yml -e osd_to_kill=1 -u user -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Verify that the OSD has been successfully removed:
Syntax
ceph osd tree
[root@mon ~]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- The Red Hat Ceph Storage Installation Guide.
- See the Configuring Ansible’s inventory location section in the {storage_product} Installation Guide for more details on the Ansible inventory configuration.
1.5.10. Removing a Ceph OSD using the command-line interface Copy linkLink copied to clipboard!
Removing an OSD from a storage cluster involves these steps: * Updating the cluster map. * Removing its authentication key. * Removing the OSD from the OSD map. * Removing the OSD from the ceph.conf
file.
If the OSD node has multiple drives, you might need to remove an OSD for each drive by repeating this procedure for each OSD that you want to remove.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Enough available OSDs so that the storage cluster is not at its
near full
ratio. - Root-level access to the OSD node.
Procedure
Disable and stop the OSD service:
Syntax
systemctl disable ceph-osd@OSD_ID systemctl stop ceph-osd@OSD_ID
systemctl disable ceph-osd@OSD_ID systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl disable ceph-osd@4 systemctl stop ceph-osd@4
[root@osd ~]# systemctl disable ceph-osd@4 [root@osd ~]# systemctl stop ceph-osd@4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the OSD is stopped, it is
down
.Remove the OSD from the storage cluster:
Syntax
ceph osd out OSD_ID
ceph osd out OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd out 4
[root@osd ~]# ceph osd out 4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantOnce the OSD has been removed, Ceph starts rebalancing and copying data to the remaining OSDs in the storage cluster. Red Hat recommends waiting until the storage cluster becomes
active+clean
before proceeding to the next step. To observe the data migration, run the following command:Syntax
ceph -w
[root@mon ~]# ceph -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the CRUSH map so that it no longer receives data.
Syntax
ceph osd crush remove OSD_NAME
ceph osd crush remove OSD_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush remove osd.4
[root@osd ~]# ceph osd crush remove osd.4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo manually remove the OSD and the bucket that contains it, you can also decompile the CRUSH map, remove the OSD from the device list, remove the device as an item in the host bucket, or remove the host bucket. If it is in the CRUSH map and you intend to remove the host, recompile the map and set it. See the instructions for decompilimg a CRUSH map in the Storage Strategies Guide for details.
Remove the OSD authentication key:
Syntax
ceph auth del osd.OSD_ID
ceph auth del osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth del osd.4
[root@osd ~]# ceph auth del osd.4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD:
Syntax
ceph osd rm OSD_ID
ceph osd rm OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd rm 4
[root@osd ~]# ceph osd rm 4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the storage cluster’s configuration file. The default name for the file is
/etc/ceph/ceph.conf
. Remove the OSD entry in the file, if it exists:Example
[osd.4] host = _HOST_NAME_
[osd.4] host = _HOST_NAME_
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Remove the reference to the OSD in the
/etc/fstab
file, if the OSD was added manually. Copy the updated configuration file to the
/etc/ceph/
directory of all other nodes in the storage cluster.Syntax
scp /etc/ceph/CLUSTER_NAME.conf USER_NAME@HOST_NAME:/etc/ceph/
scp /etc/ceph/CLUSTER_NAME.conf USER_NAME@HOST_NAME:/etc/ceph/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
scp /etc/ceph/ceph.conf root@node4:/etc/ceph/
[root@osd ~]# scp /etc/ceph/ceph.conf root@node4:/etc/ceph/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.5.11. Replacing a BlueStore database disk using the command-line interface Copy linkLink copied to clipboard!
When replacing the BlueStore DB device, block.db
, that contains the BlueStore OSD’s internal metadata, Red Hat supports the re-deploying of all OSDs using Ansible and the command-line interface (CLI). A corrupt block.db
file will impact all OSDs which are included in that block.db
files.
The procedure to replace the BlueStore block.db
disk, is to mark out each device in turn, wait for the data to replicate across the cluster, replace the OSD, and mark it back in again. You can retain the OSD_ID
and recreate OSD with the new block.db
partition on the replaced disk. Although this is a simple procedure. it requires a lot of data migration.
If the block.db
device has multiple OSDs, then follow this procedure for each of the OSDs on the block.db
device. You can run ceph-volume lvm list
to see block.db
to block relationships.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A storage device with partition.
- Root-level access to all the nodes.
Procedure
Check current Ceph cluster status on the monitor node:
ceph status ceph df
[root@mon ~]# ceph status [root@mon ~]# ceph df
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the failed OSDs to replace:
ceph osd tree | grep -i down
[root@mon ~]# ceph osd tree | grep -i down
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop and disable OSD service on OSD node:
Syntax
systemctl disable ceph-osd@OSD_ID systemctl stop ceph-osd@OSD_ID
systemctl disable ceph-osd@OSD_ID systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-osd@1 systemctl disable ceph-osd@1
[root@osd1 ~]# systemctl stop ceph-osd@1 [root@osd1 ~]# systemctl disable ceph-osd@1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set OSD
out
on the monitor node:Syntax
ceph osd out OSD_ID
ceph osd out OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd out 1
[root@mon ~]# ceph osd out 1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Wait for the data to migrate off the OSD:
Syntax
while ! ceph osd safe-to-destroy OSD_ID ; do sleep 60 ; done
while ! ceph osd safe-to-destroy OSD_ID ; do sleep 60 ; done
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
while ! ceph osd safe-to-destroy 1 ; do sleep 60 ; done
[root@mon ~]# while ! ceph osd safe-to-destroy 1 ; do sleep 60 ; done
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the OSD daemon on the OSD node:
Syntax
systemctl kill ceph-osd@OSD_ID
systemctl kill ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl kill ceph-osd@1
[root@osd1 ~]# systemctl kill ceph-osd@1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Make note of which device this OSD is using:
Syntax
mount | grep /var/lib/ceph/osd/ceph-OSD_ID
mount | grep /var/lib/ceph/osd/ceph-OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
mount | grep /var/lib/ceph/osd/ceph-1
[root@osd1 ~]# mount | grep /var/lib/ceph/osd/ceph-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unmount mount point of the failed drive path on OSD node:
Syntax
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[root@osd1 ~] #umount /var/lib/ceph/osd/ceph-1
[root@osd1 ~] #umount /var/lib/ceph/osd/ceph-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
noout
andnorebalance
to avoid backfilling and re-balancing:ceph osd set noout ceph osd set norebalance
[root@mon ~]# ceph osd set noout [root@mon ~]# ceph osd set norebalance
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Replace the physical drive. Refer to the hardware vendor’s documentation for the node. Allow the new drive to appear under the
/dev/
directory and make a note of the drive path before proceeding further. Destroy OSDs on the monitor node:
Syntax
ceph osd destroy OSD_ID --yes-i-really-mean-it
ceph osd destroy OSD_ID --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd destroy 1 --yes-i-really-mean-it
[root@mon ~]# ceph osd destroy 1 --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantThis step destroys the contents of the device. Ensure the data on the device is not needed and the cluster is healthy.
Remove the logical volume manager on the OSD disk:
Syntax
lvremove /dev/VOLUME_GROUP/LOGICAL_VOLUME vgremove VOLUME_GROUP pvremove /dev/DEVICE
lvremove /dev/VOLUME_GROUP/LOGICAL_VOLUME vgremove VOLUME_GROUP pvremove /dev/DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
lvremove /dev/data-vg1/data-lv1 vgremove data-vg1 pvremove /dev/sdb
[root@osd1 ~]# lvremove /dev/data-vg1/data-lv1 [root@osd1 ~]# vgremove data-vg1 [root@osd1 ~]# pvremove /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Zap the OSD disk on OSD node:
Syntax
ceph-volume lvm zap DEVICE
ceph-volume lvm zap DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm zap /dev/sdb
[root@osd1 ~]# ceph-volume lvm zap /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Recreate lvm on OSD disk:
Syntax
pvcreate /dev/DEVICE vgcreate VOLUME_GROUP /dev/DEVICE lvcreate -l SIZE -n LOGICAL_VOLUME VOLUME_GROUP
pvcreate /dev/DEVICE vgcreate VOLUME_GROUP /dev/DEVICE lvcreate -l SIZE -n LOGICAL_VOLUME VOLUME_GROUP
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
pvcreate /dev/sdb vgcreate data-vg1 /dev/sdb lvcreate -l 100%FREE -n data-lv1 data-vg1
[root@osd1 ~]# pvcreate /dev/sdb [root@osd1 ~]# vgcreate data-vg1 /dev/sdb [root@osd1 ~]# lvcreate -l 100%FREE -n data-lv1 data-vg1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create lvm on the new
block.db
disk:Syntax
pvcreate /dev/DEVICE vgcreate VOLUME_GROUP_DATABASE /dev/DEVICE lvcreate -Ll SIZE -n LOGICAL_VOLUME_DATABASE VOLUME_GROUP_DATABASE
pvcreate /dev/DEVICE vgcreate VOLUME_GROUP_DATABASE /dev/DEVICE lvcreate -Ll SIZE -n LOGICAL_VOLUME_DATABASE VOLUME_GROUP_DATABASE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
pvcreate /dev/sdb vgcreate db-vg1 /dev/sdb lvcreate -l 100%FREE -n lv-db1 db-vg1
[root@osd1 ~]# pvcreate /dev/sdb [root@osd1 ~]# vgcreate db-vg1 /dev/sdb [root@osd1 ~]# lvcreate -l 100%FREE -n lv-db1 db-vg1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Recreate the OSDs on the OSD node:
Syntax
ceph-volume lvm create --bluestore --osd-id OSD_ID --data VOLUME_GROUP/LOGICAL_VOLUME --block.db VOLUME_GROUP_DATABASE/LOGICAL_VOLUME_DATABASE
ceph-volume lvm create --bluestore --osd-id OSD_ID --data VOLUME_GROUP/LOGICAL_VOLUME --block.db VOLUME_GROUP_DATABASE/LOGICAL_VOLUME_DATABASE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm create --bluestore --osd-id 1 --data data-vg1/data-lv1 --block.db db-vg1/db-lv1
[root@osd1 ~]# ceph-volume lvm create --bluestore --osd-id 1 --data data-vg1/data-lv1 --block.db db-vg1/db-lv1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRed Hat recommends to use the same OSD_ID as the one destroyed in the previous steps.
Start and enable OSD service on OSD node:
Syntax
systemctl start ceph-osd@OSD_ID systemctl enable ceph-osd@OSD_ID
systemctl start ceph-osd@OSD_ID systemctl enable ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl start ceph-osd@1 systemctl enable ceph-osd@1
[root@osd1 ~]# systemctl start ceph-osd@1 [root@osd1 ~]# systemctl enable ceph-osd@1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the CRUSH hierarchy to ensure OSD is in the cluster:
ceph osd tree
[root@mon ~]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unset noout and norebalance:
ceph osd unset noout ceph osd unset norebalance
[root@mon ~]# ceph osd unset noout [root@mon ~]# ceph osd unset norebalance
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor cluster status until
HEALTH_OK
:watch -n2 ceph -s
[root@mon ~]# watch -n2 ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Installing a Red Hat Ceph Storage cluster chapter in Red Hat Ceph StorageInstallation Guide for more information.
1.5.12. Observing the data migration Copy linkLink copied to clipboard!
When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or existing OSD(s).
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Recently added or removed an OSD.
Procedure
To observe the data migration:
ceph -w
[root@monitor ~]# ceph -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Watch as the placement group states change from
active+clean
toactive, some degraded objects
, and finallyactive+clean
when migration completes. -
To exit the utility, press
Ctrl + C
.
1.6. Recalculating the placement groups Copy linkLink copied to clipboard!
Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is built upon the given redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded pools, the number of OSDs to use is defined by the number of chunks.
See the KnowledgeBase article How do I increase placement group (PG) count in a Ceph Cluster for additional details.
When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a tool is available to produce agile environments.
During lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the pg_num
of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten times the number of OSDs, the number of PGs per OSD will go down to ten only. Because small number of PGs per OSD will tend to an unevenly distributed capacity, consider adjusting the PGs per pool.
Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.
Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O queues and high latency rendering the storage cluster unusable or will result in long healing times.
Additional Resources
- See the PG calculator for calculating the values by a given use case.
- See the Erasure Code Pools chapter in the Red Hat Ceph Storage Strategies Guide for more information.
1.7. Using the Ceph Manager balancer module Copy linkLink copied to clipboard!
The balancer is a module for Ceph Manager (ceph-mgr
) that optimizes the placement of placement groups (PGs) across OSDs in order to achieve a balanced distribution, either automatically or in a supervised fashion.
Modes
There are currently two supported balancer modes:
crush-compat: The CRUSH compat mode uses the compat
weight-set
feature, introduced in Ceph Luminous, to manage an alternative set of weights for devices in the CRUSH hierarchy. The normal weights should remain set to the size of the device to reflect the target amount of data that you want to store on the device. The balancer then optimizes theweight-set
values, adjusting them up or down in small increments in order to achieve a distribution that matches the target distribution as closely as possible. Because PG placement is a pseudorandom process, there is a natural amount of variation in the placement; by optimizing the weights, the balancer counter-acts that natural variation.This mode is fully backwards compatible with older clients. When an OSDMap and CRUSH map are shared with older clients, the balancer presents the optimized weights as the real weights.
The primary restriction of this mode is that the balancer cannot handle multiple CRUSH hierarchies with different placement rules if the subtrees of the hierarchy share any OSDs. Because this configuration makes managing space utilization on the shared OSDs difficult, it is generally not recommended. As such, this restriction is normally not an issue.
upmap: Starting with Luminous, the OSDMap can store explicit mappings for individual OSDs as exceptions to the normal CRUSH placement calculation. These
upmap
entries provide fine-grained control over the PG mapping. This CRUSH mode will optimize the placement of individual PGs in order to achieve a balanced distribution. In most cases, this distribution is "perfect", with an equal number of PGs on each OSD +/-1 PG, as they might not divide evenly.ImportantTo allow use of this feature, you must tell the cluster that it only needs to support luminous or later clients with the following command:
ceph osd set-require-min-compat-client luminous
[root@admin ~]# ceph osd set-require-min-compat-client luminous
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command fails if any pre-luminous clients or daemons are connected to the monitors.
Due to a known issue, kernel CephFS clients report themselves as jewel clients. To work around this issue, use the
--yes-i-really-mean-it
flag:ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
[root@admin ~]# ceph osd set-require-min-compat-client luminous --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can check what client versions are in use with:
ceph features
[root@admin ~]# ceph features
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Prerequisites
- A running Red Hat Ceph Storage cluster.
Procedure
Make sure that the balancer module is on:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the balancer module is not listed in the
always_on
orenabled
modules, enable it:Syntax
ceph mgr module enable balancer
ceph mgr module enable balancer
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Turn on the balancer module:
ceph balancer on
[root@mon ~]# ceph balancer on
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The default mode is
crush-compat
. The mode can be changed with:ceph balancer mode upmap
[root@mon ~]# ceph balancer mode upmap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow or
ceph balancer mode crush-compat
[root@mon ~]# ceph balancer mode crush-compat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Status
The current status of the balancer can be checked at any time with:
ceph balancer status
[root@mon ~]# ceph balancer status
Automatic balancing
By default, when turning on the balancer module, automatic balancing is used:
ceph balancer on
[root@mon ~]# ceph balancer on
The balancer can be turned back off again with:
ceph balancer off
[root@mon ~]# ceph balancer off
This will use the crush-compat
mode, which is backward compatible with older clients and will make small changes to the data distribution over time to ensure that OSDs are equally utilized.
Throttling
No adjustments will be made to the PG distribution if the cluster is degraded, for example, if an OSD has failed and the system has not yet healed itself.
When the cluster is healthy, the balancer throttles its changes such that the percentage of PGs that are misplaced, or need to be moved, is below a threshold of 5% by default. This percentage can be adjusted using the target_max_misplaced_ratio
setting. For example, to increase the threshold to 7%:
Example
ceph config set mgr target_max_misplaced_ratio .07
[root@mon ~]# ceph config set mgr target_max_misplaced_ratio .07
For automatic balancing:
- Set the number of seconds to sleep in between runs of the automatic balancer:
Example
ceph config set mgr mgr/balancer/sleep_interval 60
[root@mon ~]# ceph config set mgr mgr/balancer/sleep_interval 60
- Set the time of day to begin automatic balancing in HHMM format:
Example
ceph config set mgr mgr/balancer/begin_time 0000
[root@mon ~]# ceph config set mgr mgr/balancer/begin_time 0000
- Set the time of day to finish automatic balancing in HHMM format:
Example
ceph config set mgr mgr/balancer/end_time 2359
[root@mon ~]# ceph config set mgr mgr/balancer/end_time 2359
-
Restrict automatic balancing to this day of the week or later. Uses the same conventions as crontab,
0
is Sunday,1
is Monday, and so on:
Example
ceph config set mgr mgr/balancer/begin_weekday 0
[root@mon ~]# ceph config set mgr mgr/balancer/begin_weekday 0
-
Restrict automatic balancing to this day of the week or earlier. This uses the same conventions as crontab,
0
is Sunday,1
is Monday, and so on:
Example
ceph config set mgr mgr/balancer/end_weekday 6
[root@mon ~]# ceph config set mgr mgr/balancer/end_weekday 6
-
Define the pool IDs to which the automatic balancing is limited. The default for this is an empty string, meaning all pools are balanced. The numeric pool IDs can be gotten with the
ceph osd pool ls detail
command:
Example
ceph config set mgr mgr/balancer/pool_ids 1,2,3
[root@mon ~]# ceph config set mgr mgr/balancer/pool_ids 1,2,3
Supervised optimization
The balancer operation is broken into a few distinct phases:
-
Building a
plan
. -
Evaluating the quality of the data distribution, either for the current PG distribution, or the PG distribution that would result after executing a
plan
. Executing the
plan
.To evaluate and score the current distribution:
ceph balancer eval
[root@mon ~]# ceph balancer eval
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To evaluate the distribution for a single pool:
Syntax
ceph balancer eval POOL_NAME
ceph balancer eval POOL_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph balancer eval rbd
[root@mon ~]# ceph balancer eval rbd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To see greater detail for the evaluation:
ceph balancer eval-verbose ...
[root@mon ~]# ceph balancer eval-verbose ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To generate a plan using the currently configured mode:
Syntax
ceph balancer optimize PLAN_NAME
ceph balancer optimize PLAN_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace PLAN_NAME with a custom plan name.
Example
ceph balancer optimize rbd_123
[root@mon ~]# ceph balancer optimize rbd_123
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To see the contents of a plan:
Syntax
ceph balancer show PLAN_NAME
ceph balancer show PLAN_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph balancer show rbd_123
[root@mon ~]# ceph balancer show rbd_123
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To discard old plans:
Syntax
ceph balancer rm PLAN_NAME
ceph balancer rm PLAN_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph balancer rm rbd_123
[root@mon ~]# ceph balancer rm rbd_123
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To see currently recorded plans use the status command:
ceph balancer status
[root@mon ~]# ceph balancer status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To calculate the quality of the distribution that would result after executing a plan:
Syntax
ceph balancer eval PLAN_NAME
ceph balancer eval PLAN_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph balancer eval rbd_123
[root@mon ~]# ceph balancer eval rbd_123
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To execute the plan:
Syntax
ceph balancer execute PLAN_NAME
ceph balancer execute PLAN_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph balancer execute rbd_123
[root@mon ~]# ceph balancer execute rbd_123
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteOnly execute the plan if is expected to improve the distribution. After execution, the plan will be discarded.
1.8. Using upmap to manually rebalance data on OSDs Copy linkLink copied to clipboard!
As a storage administrator, you can manually rebalance data on OSDs by moving selected placement groups (PGs) to specific OSDs. To perform manual rebalancing, turn off the Ceph Manager balancer module and use the upmap
mode to move the PGs.
Prerequisites
- A running Red Hat storage cluster.
- Root-level access to all nodes in the storage cluster.
Procedure
Make sure that the balancer module is on:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the balancer module is not listed in the
always_on
orenabled
modules, enable it:Syntax
ceph mgr module enable balancer
ceph mgr module enable balancer
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Set the balancer mode to
upmap
:Syntax
ceph balancer mode upmap
ceph balancer mode upmap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Turn off the balancer module:
Syntax
ceph balancer off
ceph balancer off
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check balancer status:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
norebalance
flag for the OSDs:Syntax
ceph osd set norebalance
ceph osd set norebalance
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
ceph pg dump pgs_brief
command to list the pools in your storage cluster and the space each consumes. Usegrep
to search for remapped pools.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Move the PGs to the OSDs where you want them to reside. For example, to move PG 7.ac from OSDs 8 and 3 to OSDs 3 and 37:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRepeat this step to move each of the remapped PGs, one at a time.
Use the
ceph pg dump pgs_brief
command again to check that the PGs move to theactive+clean
state:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The time it takes for the PGs to move to
active+clean
depends on the numbers of PGs and OSDs. In addition, the number of objects misplaced depends on the value set formgr target_max_misplaced_ratio
. A higher value set fortarget_max_misplaced_ratio
results in a greater number of misplaced objects; thus, it takes a longer time for all PGs to becomeactive+clean
.Unset the
norebalance
flag:Syntax
ceph osd unset norebalance
ceph osd unset norebalance
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Turn the balancer module back on:
Syntax
ceph balancer on
ceph balancer on
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Once you enable the balancer module, it slowly moves the PGs back to their intended OSDs according to the CRUSH rules for the storage cluster. The balancing process might take some time, but completes eventually.
1.9. Using the Ceph Manager alerts module Copy linkLink copied to clipboard!
You can use the Ceph Manager alerts module to send simple alert messages about the Red Hat Ceph Storage cluster’s health by email.
This module is not intended to be a robust monitoring solution. The fact that it is run as part of the Ceph cluster itself is fundamentally limiting in that a failure of the ceph-mgr
daemon prevents alerts from being sent. This module can, however, be useful for standalone clusters that exist in environments where existing monitoring infrastructure does not exist.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph Monitor node.
Procedure
Enable the alerts module:
Example
ceph mgr module enable alerts
[root@host01 ~]# ceph mgr module enable alerts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the alerts module is enabled:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure the Simple Mail Transfer Protocol (SMTP):
Syntax
ceph config set mgr mgr/alerts/smtp_host SMTP_SERVER ceph config set mgr mgr/alerts/smtp_destination RECEIVER_EMAIL_ADDRESS ceph config set mgr mgr/alerts/smtp_sender SENDER_EMAIL_ADDRESS
ceph config set mgr mgr/alerts/smtp_host SMTP_SERVER ceph config set mgr mgr/alerts/smtp_destination RECEIVER_EMAIL_ADDRESS ceph config set mgr mgr/alerts/smtp_sender SENDER_EMAIL_ADDRESS
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph config set mgr mgr/alerts/smtp_host smtp.example.com ceph config set mgr mgr/alerts/smtp_destination example@example.com ceph config set mgr mgr/alerts/smtp_sender example2@example.com
[root@host01 ~]# ceph config set mgr mgr/alerts/smtp_host smtp.example.com [root@host01 ~]# ceph config set mgr mgr/alerts/smtp_destination example@example.com [root@host01 ~]# ceph config set mgr mgr/alerts/smtp_sender example2@example.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: By default, the alerts module uses SSL and port 465. To change that, set the
smtp_ssl
tofalse
:Syntax
ceph config set mgr mgr/alerts/smtp_ssl false ceph config set mgr mgr/alerts/smtp_port PORT_NUMBER
ceph config set mgr mgr/alerts/smtp_ssl false ceph config set mgr mgr/alerts/smtp_port PORT_NUMBER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph config set mgr mgr/alerts/smtp_ssl false ceph config set mgr mgr/alerts/smtp_port 587
[root@host01 ~]# ceph config set mgr mgr/alerts/smtp_ssl false [root@host01 ~]# ceph config set mgr mgr/alerts/smtp_port 587
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Authenticate to the SMTP server:
Syntax
ceph config set mgr mgr/alerts/smtp_user USERNAME ceph config set mgr mgr/alerts/smtp_password PASSWORD
ceph config set mgr mgr/alerts/smtp_user USERNAME ceph config set mgr mgr/alerts/smtp_password PASSWORD
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph config set mgr mgr/alerts/smtp_user admin1234 ceph config set mgr mgr/alerts/smtp_password admin1234
[root@host01 ~]# ceph config set mgr mgr/alerts/smtp_user admin1234 [root@host01 ~]# ceph config set mgr mgr/alerts/smtp_password admin1234
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: By default, SMTP
From
name isCeph
. To change that, set thesmtp_from_name
parameter:Syntax
ceph config set mgr mgr/alerts/smtp_from_name CLUSTER_NAME
ceph config set mgr mgr/alerts/smtp_from_name CLUSTER_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Test'
[root@host01 ~]# ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Test'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: By default, the alerts module checks the storage cluster’s health every minute, and sends a message when there is a change in the cluster health status. To change the frequency, set the
interval
parameter:Syntax
ceph config set mgr mgr/alerts/interval INTERVAL
ceph config set mgr mgr/alerts/interval INTERVAL
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph config set mgr mgr/alerts/interval "5m"
[root@host01 ~]# ceph config set mgr mgr/alerts/interval "5m"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example, the interval is set to 5 minutes.
Optional: Send an alert immediately:
Example
ceph alerts send
[root@host01 ~]# ceph alerts send
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.10. Using the Ceph manager crash module Copy linkLink copied to clipboard!
Using the Ceph manager crash module, you can collect information about daemon crashdumps and store it in the Red Hat Ceph Storage cluster for further analysis.
By default, daemon crashdumps are dumped in /var/lib/ceph/crash
. You can configure crashdumps with the option crash dir
. Crash directories are named by time, date, and a randomly-generated UUID, and contain a metadata file meta
and a recent log file, with a crash_id
that is the same.
You can use ceph-crash.service
to submit these crash automatically and persist in the Ceph Monitors. The ceph-crash.service
watches watches the crashdump directory and uploads them with ceph crash post
.
The RECENT_CRASH heath message is one of the most common health messages in a Ceph cluster. This health message means that one or more Ceph daemons has crashed recently, and the crash has not yet been archived or acknowledged by the administrator. This might indicate a software bug, a hardware problem like a failing disk, or some other problem. The option mgr/crash/warn_recent_interval
controls the time period of what recent means, which is two weeks by default. You can disable the warnings by running the following command:
Example
ceph config set mgr/crash/warn_recent_interval 0
[root@mon ~]# ceph config set mgr/crash/warn_recent_interval 0
The option mgr/crash/retain_interval
controls the period for which you want to retain the crash reports before they are automatically purged. The default for this option is one year.
Prerequisites
- A running Red Hat Ceph Storage cluster.
Procedure
Ensure the crash module is enabled:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Save a crash dump: The metadata file is a JSON blob stored in the crash dir as
meta
. You can invoke the ceph command-i -
option, which reads from stdin.Example
ceph crash post -i meta
[root@mon ~]# ceph crash post -i meta
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the timestamp or the UUID crash IDs for all the new and archived crash info:
Example
ceph crash ls
[root@mon ~]# ceph crash ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the timestamp or the UUID crash IDs for all the new crash information:
Example
ceph crash ls-new
[root@mon ~]# ceph crash ls-new
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the timestamp or the UUID crash IDs for all the new crash information:
Example
ceph crash ls-new
[root@mon ~]# ceph crash ls-new
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the summary of saved crash information grouped by age:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the details of the saved crash:
Syntax
ceph crash info CRASH_ID
ceph crash info CRASH_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove saved crashes older than KEEP days: Here, KEEP must be an integer.
Syntax
ceph crash prune KEEP
ceph crash prune KEEP
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph crash prune 60
[root@mon ~]# ceph crash prune 60
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Archive a crash report so that it is no longer considered for the
RECENT_CRASH
health check and does not appear in thecrash ls-new
output. It appears in thecrash ls
.Syntax
ceph crash archive CRASH_ID
ceph crash archive CRASH_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph crash archive 2021-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d
[root@mon ~]# ceph crash archive 2021-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Archive all crash reports:
Example
ceph crash archive-all
[root@mon ~]# ceph crash archive-all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the crash dump:
Syntax
ceph crash rm CRASH_ID
ceph crash rm CRASH_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph crash rm 2021-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d
[root@mon ~]# ceph crash rm 2021-05-24T19:58:42.549073Z_b2382865-ea89-4be2-b46f-9a59af7b7a2d
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.11. Migrating RBD mirroring daemons Copy linkLink copied to clipboard!
For two-way Block device (RBD) mirroring configured using the command-line interface in a bare-metal storage cluster, the cluster does not migrate RBD mirroring. Migrate RBD mirror daemons from CLI to Ceph-Ansible prior to upgrading the storage cluster or converting the cluster to containerized.
Prerequisites
- A running Red Hat Ceph Storage non-containerized, bare-metal, cluster.
- Access to the Ansible administration node.
- An ansible user account.
- Sudo access to the ansible user account.
Procedure
Create a user on the Ceph client node:
Syntax
ceph auth get client.PRIMARY_CLUSTER_NAME -o /etc/ceph/ceph.PRIMARY_CLUSTER_NAME.keyring
ceph auth get client.PRIMARY_CLUSTER_NAME -o /etc/ceph/ceph.PRIMARY_CLUSTER_NAME.keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth get client.rbd-mirror.site-a -o /etc/ceph/ceph.client.rbd-mirror.site-a.keyring
[root@rbd-client-site-a ~]# ceph auth get client.rbd-mirror.site-a -o /etc/ceph/ceph.client.rbd-mirror.site-a.keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change the username in the
auth
file in/etc/ceph
directory:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Import the
auth
file to add relevant permissions:Syntax
ceph auth import -i PATH_TO_KEYRING
ceph auth import -i PATH_TO_KEYRING
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph auth import -i /etc/ceph/ceph.client.rbd-mirror.rbd-client-site-a.keyring
[root@rbd-client-site-a ~]# ceph auth import -i /etc/ceph/ceph.client.rbd-mirror.rbd-client-site-a.keyring
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the service name of the RBD mirror node:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the rbd-mirror node to the
/etc/ansible/hosts
file:Example
[rbdmirrors] ceph.client.rbd-mirror.rbd-client-site-a
[rbdmirrors] ceph.client.rbd-mirror.rbd-client-site-a
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.12. Additional Resources Copy linkLink copied to clipboard!
- See the Red Hat Ceph Storage Installation Guide for details on installing the Red Hat Ceph Storage product.
- See the Placement Groups (PGs) chapter in the Red Hat Ceph Storage Strategies Guide for more information.
- See the Red Hat Enterprise Linux 8 Configuring and Managing Logical Volumes guide for more details.
Chapter 2. Handling a disk failure Copy linkLink copied to clipboard!
As a storage administrator, you will have to deal with a disk failure at some point over the life time of the storage cluster. Testing and simulating a disk failure before a real failure happens will ensure you are ready for when the real thing does happen.
Here is the high-level workflow for replacing a failed disk:
- Find the failed OSD.
- Take OSD out.
- Stop the OSD daemon on the node.
- Check Ceph’s status.
- Remove the OSD from the CRUSH map.
- Delete the OSD authorization.
- Remove the OSD from the storage cluster.
- Unmount the filesystem on node.
- Replace the failed drive.
- Add the OSD back to the storage cluster.
- Check Ceph’s status.
2.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
- A failed disk.
2.2. Disk failures Copy linkLink copied to clipboard!
Ceph is designed for fault tolerance, which means Ceph can operate in a degraded
state without losing data. Ceph can still operate even if a data storage drive fails. The degraded
state means the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the storage cluster. When an OSD gets marked down
this can mean the drive has failed.
When a drive fails, initially the OSD status will be down
, but still in
the storage cluster. Networking issues can also mark an OSD as down
even if it is really up
. First check for any network issues in the environment. If the networking checks out okay, then it is likely the OSD drive has failed.
Modern servers typically deploy with hot-swappable drives allowing you to pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph you will also have to remove the software-defined part of the OSD.
2.3. Simulating a disk failure Copy linkLink copied to clipboard!
There are two disk failure scenarios: hard and soft. A hard failure means replacing the disk. Soft failure might be an issue with the device driver or some other software component.
In the case of a soft failure, replacing the disk might not be needed. If replacing a disk, then steps need to be followed to remove the failed disk and add the replacement disk to Ceph. In order to simulate a soft disk failure the best thing to do is delete the device. Choose a device and delete the device from the system.
Prerequisites
- A healthy, and running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph OSD node.
Procedure
Remove the block device from
sysfs
:Syntax
echo 1 > /sys/block/BLOCK_DEVICE/device/delete
echo 1 > /sys/block/BLOCK_DEVICE/device/delete
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
echo 1 > /sys/block/sdb/device/delete
[root@osd ~]# echo 1 > /sys/block/sdb/device/delete
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the Ceph OSD log, on the OSD node, Ceph detected the failure and started the recovery process automatically.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Looking at Ceph OSD disk tree, we also see the disk is offline.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4. Replacing a failed OSD disk Copy linkLink copied to clipboard!
The general procedure for replacing an OSD involves removing the OSD from the storage cluster, replacing the drive and then recreating the OSD.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Check storage cluster health:
ceph health
[root@mon ~]# ceph health
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the OSD location in the CRUSH hierarchy:
ceph osd tree | grep -i down
[root@mon ~]# ceph osd tree | grep -i down
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the OSD node, try to start the OSD:
Syntax
systemctl start ceph-osd@OSD_ID
systemctl start ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteIf the OSD is
down
, then the OSD will eventually get markedout
. This is normal behavior for Ceph Storage. When the OSD gets markedout
, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the storage cluster. While the storage cluster is backfilling, the cluster will be in adegraded
state.For containerized deployments of Ceph, try to start the OSD container with the OSD_ID:
Syntax
systemctl start ceph-osd@OSD_ID
systemctl start ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the command indicates that the OSD is already running, there might be a heartbeat or networking issue. If you cannot restart the OSD, then the drive might have failed.
NoteThe drive associated with the OSD can be determined by Mapping a container OSD ID to a drive.
Check the failed OSD’s mount point:
NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted, so you cannot run
df
to check its mount point. Use another method to determine if the OSD drive has failed. For example, runsmartctl
on the drive from the container node.df -h
[root@osd ~]# df -h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you cannot restart the OSD, you can check the mount point. If the mount point no longer appears, then you can try remounting the OSD drive and restarting the OSD. If you cannot restore the mount point, then you might have a failed OSD drive.
Using the
smartctl
utility cab help determine if the drive is healthy:Syntax
yum install smartmontools smartctl -H /dev/BLOCK_DEVICE
yum install smartmontools smartctl -H /dev/BLOCK_DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
smartctl -H /dev/sda
[root@osd ~]# smartctl -H /dev/sda
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the drive has failed, you need to replace it.
Stop the OSD process:
Syntax
systemctl stop ceph-osd@OSD_ID
systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For containerized deployments of Ceph, stop the OSD container:
Syntax
systemctl stop ceph-osd@OSD_ID
systemctl stop ceph-osd@OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD out of the storage cluster:
Syntax
ceph osd out OSD_ID
ceph osd out OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the failed OSD is backfilling:
ceph -w
[root@osd ~]# ceph -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the CRUSH Map:
Syntax
ceph osd crush remove osd.OSD_ID
ceph osd crush remove osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis step is only needed, if you are permanently removing the OSD and not redeploying it.
Remove the OSD’s authentication keys:
Syntax
ceph auth del osd.OSD_ID
ceph auth del osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the keys for the OSD are not listed:
Example
ceph auth list
[root@osd ~]# ceph auth list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD from the storage cluster:
Syntax
ceph osd rm osd.OSD_ID
ceph osd rm osd.OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unmount the failed drive path:
Syntax
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
umount /var/lib/ceph/osd/ceph-0
[root@osd ~]# umount /var/lib/ceph/osd/ceph-0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor containerized deployments of Ceph, if the OSD is down the container will be down and the OSD drive will be unmounted. In this case there is nothing to unmount and this step can be skipped.
Replace the physical drive. Refer to the hardware vendor’s documentation for the node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MIGHT need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you might set the cluster to
noout
to prevent backfilling:Example
ceph osd set noout
[root@osd ~]# ceph osd set noout
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once you replace the drive and you bring the node and its OSDs back online, remove the
noout
setting:Example
ceph osd unset noout
[root@osd ~]# ceph osd unset noout
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Allow the new drive to appear under the
/dev/
directory and make a note of the drive path before proceeding further.- Find the OSD drive and format the disk.
Recreate the OSD:
- Using Ceph Ansible.
- Using the command-line interface.
Check the CRUSH hierarchy to ensure it is accurate:
Example
ceph osd tree
[root@osd ~]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you are not satisfied with the location of the OSD in the CRUSH hierarchy, you can move it with the
move
command:Syntax
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify the OSD is online.
2.5. Replacing an OSD drive while retaining the OSD ID Copy linkLink copied to clipboard!
When replacing a failed OSD drive, you can keep the original OSD ID and CRUSH map entry.
The ceph-volume lvm
commands defaults to BlueStore for OSDs.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A failed disk.
Procedure
Destroy the OSD:
Syntax
ceph osd destroy OSD_ID --yes-i-really-mean-it
ceph osd destroy OSD_ID --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd destroy 1 --yes-i-really-mean-it
[root@osd ~]# ceph osd destroy 1 --yes-i-really-mean-it
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, if the replacement disk was used previously, then you need to
zap
the disk:Syntax
ceph-volume lvm zap DEVICE
ceph-volume lvm zap DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm zap /dev/sdb
[root@osd ~]# ceph-volume lvm zap /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can find the DEVICE by comparing output from various commands, such as
ceph osd tree
,ceph osd metadata
, anddf
.Create the new OSD with the existing OSD ID:
Syntax
ceph-volume lvm create --osd-id OSD_ID --data DEVICE
ceph-volume lvm create --osd-id OSD_ID --data DEVICE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph-volume lvm create --osd-id 1 --data /dev/sdb
[root@mon ~]# ceph-volume lvm create --osd-id 1 --data /dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Adding a Ceph OSD using Ansible with the same disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Adding a Ceph OSD using Ansible with different disk topologies section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Preparing Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Activating Ceph OSDs using `ceph-volume` section in the Red Hat Ceph Storage Operations Guide for more details.
- See the Adding a Ceph OSD using the command-line interface section in the Red Hat Ceph Storage Operations Guide for more details.
Chapter 3. Handling a node failure Copy linkLink copied to clipboard!
As a storage administrator, you can experience a whole node failing within the storage cluster, and handling a node failure is similar to handling a disk failure. With a node failure, instead of Ceph recovering placement groups (PGs) for only one disk, all PGs on the disks within that node must be recovered. Ceph will detect that the OSDs are all down and automatically start the recovery process, known as self-healing.
There are three node failure scenarios. Here is the high-level workflow for each scenario when replacing a node:
Replacing the node, but using the root and Ceph OSD disks from the failed node.
- Disable backfilling.
- Replace the node, taking the disks from old node, and adding them to the new node.
- Enable backfilling.
Replacing the node, reinstalling the operating system, and using the Ceph OSD disks from the failed node.
- Disable backfilling.
- Create a backup of the Ceph configuration.
Replace the node and add the Ceph OSD disks from failed node.
- Configuring disks as JBOD.
- Install the operating system.
- Restore the Ceph configuration.
-
Run
ceph-ansible
. - Enable backfilling.
Replacing the node, reinstalling the operating system, and using all new Ceph OSDs disks.
- Disable backfilling.
- Remove all OSDs on the failed node from the storage cluster.
- Create a backup of the Ceph configuration.
Replace the node and add the Ceph OSD disks from failed node.
- Configuring disks as JBOD.
- Install the operating system.
-
Run
ceph-ansible
. - Enable backfilling.
3.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
- A failed node.
3.2. Considerations before adding or removing a node Copy linkLink copied to clipboard!
One of the outstanding features of Ceph is the ability to add or remove Ceph OSD nodes at run time. This means that you can resize the storage cluster capacity or replace hardware without taking down the storage cluster.
The ability to serve Ceph clients while the storage cluster is in a degraded
state also has operational benefits. For example, you can add or remove or replace hardware during regular business hours, rather than working overtime or on weekends. However, adding and removing Ceph OSD nodes can have a significant impact on performance.
Before you add or remove Ceph OSD nodes, consider the effects on storage cluster performance:
- Whether you are expanding or reducing the storage cluster capacity, adding or removing Ceph OSD nodes induces backfilling as the storage cluster rebalances. During that rebalancing time period, Ceph uses additional resources, which can impact storage cluster performance.
- In a production Ceph storage cluster, a Ceph OSD node has a particular hardware configuration that facilitates a particular type of storage strategy.
- Since a Ceph OSD node is part of a CRUSH hierarchy, the performance impact of adding or removing a node typically affects the performance of pools that use the CRUSH ruleset.
Additional Resources
3.3. Performance considerations Copy linkLink copied to clipboard!
The following factors typically affect a storage cluster’s performance when adding or removing Ceph OSD nodes:
- Ceph clients place load on the I/O interface to Ceph; that is, the clients place load on a pool. A pool maps to a CRUSH ruleset. The underlying CRUSH hierarchy allows Ceph to place data across failure domains. If the underlying Ceph OSD node involves a pool that is experiencing high client load, the client load could significantly affect recovery time and reduce performance. Because write operations require data replication for durability, write-intensive client loads in particular can increase the time for the storage cluster to recover.
- Generally, the capacity you are adding or removing affects the storage cluster’s time to recover. In addition, the storage density of the node you add or remove might also affect recovery times. For example, a node with 36 OSDs typically takes longer to recover than a node with 12 OSDs.
-
When removing nodes, you MUST ensure that you have sufficient spare capacity so that you will not reach
full ratio
ornear full ratio
. If the storage cluster reachesfull ratio
, Ceph will suspend write operations to prevent data loss. - A Ceph OSD node maps to at least one Ceph CRUSH hierarchy, and the hierarchy maps to at least one pool. Each pool that uses a CRUSH ruleset experiences a performance impact when Ceph OSD nodes are added or removed.
-
Replication pools tend to use more network bandwidth to replicate deep copies of the data, whereas erasure coded pools tend to use more CPU to calculate
k+m
coding chunks. The more copies that exist of the data, the longer it takes for the storage cluster to recover. For example, a larger pool or one that has a greater number ofk+m
chunks will take longer to recover than a replication pool with fewer copies of the same data. - Drives, controllers and network interface cards all have throughput characteristics that might impact the recovery time. Generally, nodes with higher throughput characteristics, such as 10 Gbps and SSDs, recover more quickly than nodes with lower throughput characteristics, such as 1 Gbps and SATA drives.
3.4. Recommendations for adding or removing nodes Copy linkLink copied to clipboard!
Red Hat recommends adding or removing one OSD at a time within a node and allowing the storage cluster to recover before proceeding to the next OSD. This helps to minimize the impact on storage cluster performance. Note that if a node fails, you might need to change the entire node at once, rather than one OSD at a time.
To remove an OSD:
- Using Ansible.
- Using the command-line interface.
To add an OSD:
- Using Ansible.
- Using the command-line interface.
When adding or removing Ceph OSD nodes, consider that other ongoing processes also affect storage cluster performance. To reduce the impact on client I/O, Red Hat recommends the following:
Calculate capacity
Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all its OSDs without reaching the full ratio
. Reaching the full ratio
will cause the storage cluster to refuse write operations.
Temporarily disable scrubbing
Scrubbing is essential to ensuring the durability of the storage cluster’s data; however, it is resource intensive. Before adding or removing a Ceph OSD node, disable scrubbing and deep scrubbing and let the current scrubbing operations complete before proceeding.
ceph osd_set_noscrub ceph osd_set_nodeep-scrub
ceph osd_set_noscrub
ceph osd_set_nodeep-scrub
Once you have added or removed a Ceph OSD node and the storage cluster has returned to an active+clean
state, unset the noscrub
and nodeep-scrub
settings.
Limit backfill and recovery
If you have reasonable data durability, there is nothing wrong with operating in a degraded
state. For example, you can operate the storage cluster with osd_pool_default_size = 3
and osd_pool_default_min_size = 2
. You can tune the storage cluster for the fastest possible recovery time, but doing so significantly affects Ceph client I/O performance. To maintain the highest Ceph client I/O performance, limit the backfill and recovery operations and allow them to take longer.
osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_op_priority = 1
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1
You can also consider setting the sleep and delay parameters such as, osd_recovery_sleep
.
Increase the number of placement groups
Finally, if you are expanding the size of the storage cluster, you may need to increase the number of placement groups. If you determine that you need to expand the number of placement groups, Red Hat recommends making incremental increases in the number of placement groups. Increasing the number of placement groups by a significant amount will cause a considerable degradation in performance.
See the KnowledgeBase article How do I increase placement group (PG) count in a Ceph Cluster for additional details.
3.5. Adding a Ceph OSD node Copy linkLink copied to clipboard!
To expand the capacity of the Red Hat Ceph Storage cluster, add an OSD node.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A provisioned node with a network connection.
- Installation of Red Hat Enterprise Linux 8.
- Review the Requirements for Installing Red Hat Ceph Storage chapter in the Red Hat Ceph Storage Installation Guide.
Procedure
- Verify that other nodes in the storage cluster can reach the new node by its short host name.
Temporarily disable scrubbing:
Example
ceph osd set noscrub ceph osd set nodeep-scrub
[root@mon ~]# ceph osd set noscrub [root@mon ~]# ceph osd set nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Limit the backfill and recovery features:
Syntax
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
[root@mon ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the new node to the CRUSH map:
Syntax
ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPE
ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush add-bucket node2 host
[root@mon ~]# ceph osd crush add-bucket node2 host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add an OSD for each disk on the node to the storage cluster.
- Using Ansible.
Using the command-line interface.
ImportantWhen adding an OSD node to a Red Hat Ceph Storage cluster, Red Hat recommends adding one OSD at a time within the node and allowing the cluster to recover to an
active+clean
state before proceeding to the next OSD.
Enable scrubbing:
Syntax
ceph osd unset noscrub ceph osd unset nodeep-scrub
ceph osd unset noscrub ceph osd unset nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the backfill and recovery features to default:
Syntax
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 3 --osd-recovery-op-priority 3
[root@mon ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 3 --osd-recovery-op-priority 3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Setting a Specific Configuration Setting at Runtime section in the Red Hat Ceph Storage Configuration Guide for more details.
- See Adding a Bucket and Moving a Bucket sections in the Red Hat Ceph Storage Storage Strategies Guide for details on placing the node at an appropriate location in the CRUSH hierarchy,.
3.6. Removing a Ceph OSD node Copy linkLink copied to clipboard!
To reduce the capacity of a storage cluster, remove an OSD node.
Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all OSDs without reaching the full ratio
. Reaching the full ratio
will cause the storage cluster to refuse write operations.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to all nodes in the storage cluster.
Procedure
Check the storage cluster’s capacity:
Syntax
ceph df rados df ceph osd df
ceph df rados df ceph osd df
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Temporarily disable scrubbing:
Syntax
ceph osd set noscrub ceph osd set nodeep-scrub
ceph osd set noscrub ceph osd set nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Limit the backfill and recovery features:
Syntax
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
[root@mon ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove each OSD on the node from the storage cluster:
- Using Ansible.
Using the command-line interface.
ImportantWhen removing an OSD node from the storage cluster, Red Hat recommends removing one OSD at a time within the node and allowing the cluster to recover to an
active+clean
state before proceeding to remove the next OSD.After you remove an OSD, check to verify that the storage cluster is not getting to the
near-full ratio
:Syntax
ceph -s ceph df
ceph -s ceph df
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Repeat this step until all OSDs on the node are removed from the storage cluster.
Once all OSDs are removed, remove the host bucket from the CRUSH map:
Syntax
ceph osd crush rm BUCKET_NAME
ceph osd crush rm BUCKET_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph osd crush rm node2
[root@mon ~]# ceph osd crush rm node2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable scrubbing:
Syntax
ceph osd unset noscrub ceph osd unset nodeep-scrub
ceph osd unset noscrub ceph osd unset nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the backfill and recovery features to default:
Syntax
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 3 --osd-recovery-op-priority 3
[root@mon ~]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 3 --osd-recovery-op-priority 3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Setting a specific configuration setting at runtime section in the Red Hat Ceph Storage Configuration Guide for more details.
3.7. Simulating a node failure Copy linkLink copied to clipboard!
To simulate a hard node failure, power off the node and reinstall the operating system.
Prerequisites
- A healthy running Red Hat Ceph Storage cluster.
- Root-level access to all nodes on the storage cluster.
Procedure
Check the storage cluster’s capacity to understand the impact of removing the node:
Example
ceph df rados df ceph osd df
[root@ceph1 ~]# ceph df [root@ceph1 ~]# rados df [root@ceph1 ~]# ceph osd df
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, disable recovery and backfilling:
Example
ceph osd set noout ceph osd set noscrub ceph osd set nodeep-scrub
[root@ceph1 ~]# ceph osd set noout [root@ceph1 ~]# ceph osd set noscrub [root@ceph1 ~]# ceph osd set nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Shut down the node.
If you are changing the host name, remove the node from CRUSH map:
Example
ceph osd crush rm ceph3
[root@ceph1 ~]# ceph osd crush rm ceph3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the storage cluster:
Example
ceph -s
[root@ceph1 ~]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Reinstall the operating system on the node.
Add an Ansible user and generate the SSH keys:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the Ansible administration node, copy the SSH keys for the
ansible
user on the reinstalled node:ssh-copy-id ceph3
[ansible@admin ~]$ ssh-copy-id ceph3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the Ansible administration node, run the Ansible playbook again:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally, enable recovery and backfilling:
Example
ceph osd unset noout ceph osd unset noscrub ceph osd unset nodeep-scrub
[root@ceph3 ~]# ceph osd unset noout [root@ceph3 ~]# ceph osd unset noscrub [root@ceph3 ~]# ceph osd unset nodeep-scrub
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check Ceph’s health:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- The Red Hat Ceph Storage Installation Guide.
- See the Configuring Ansible’s inventory location section in the {storage_product} Installation Guide for more details on the Ansible inventory configuration.
Chapter 4. Handling a data center failure Copy linkLink copied to clipboard!
As a storage administrator, you can take preventive measures to avoid a data center failure. These preventive measures include:
- Configuring the data center infrastructure.
- Setting up failure domains within the CRUSH map hierarchy.
- Designating failure nodes within the domains.
4.1. Prerequisites Copy linkLink copied to clipboard!
- A healthy running Red Hat Ceph Storage cluster.
- Root-level access to all nodes in the storage cluster.
4.2. Avoiding a data center failure Copy linkLink copied to clipboard!
Configuring the data center infrastructure
Each data center within a stretch cluster can have a different storage cluster configuration to reflect local capabilities and dependencies. Set up replication between the data centers to help preserve the data. If one data center fails, the other data centers in the storage cluster contain copies of the data.
Setting up failure domains within the CRUSH map hierarchy
Failure, or failover, domains are redundant copies of domains within the storage cluster. If an active domain fails, the failure domain becomes the active domain.
By default, the CRUSH map lists all nodes in a storage cluster within a flat hierarchy. However, for best results, create a logical hierarchical structure within the CRUSH map. The hierarchy designates the domains to which each node belongs and the relationships among those domains within the storage cluster, including the failure domains. Defining the failure domains for each domain within the hierarchy improves the reliability of the storage cluster.
When planning a storage cluster that contains multiple data centers, place the nodes within the CRUSH map hierarchy so that if one data center goes down, the rest of the storage cluster stays up and running.
Designating failure nodes within the domains
If you plan to use three-way replication for data within the storage cluster, consider the location of the nodes within the failure domain. If an outage occurs within a data center, it is possible that some data might reside in only one copy. When this scenario happens, there are two options:
- Leave the data in read-only status with the standard settings.
- Live with only one copy for the duration of the outage.
With the standard settings, and because of the randomness of data placement across the nodes, not all the data will be affected, but some data can have only one copy and the storage cluster would revert to read-only mode. However, if some data exist in only one copy, the storage cluster reverts to read-only mode.
4.3. Handling a data center failure Copy linkLink copied to clipboard!
Red Hat Ceph Storage can withstand catastrophic failures to the infrastructure, such as losing one of the data centers in a stretch cluster. For the standard object store use case, configuring all three data centers can be done independently with replication set up between them. In this scenario, the storage cluster configuration in each of the data centers might be different, reflecting the local capabilities and dependencies.
A logical structure of the placement hierarchy should be considered. A proper CRUSH map can be used, reflecting the hierarchical structure of the failure domains within the infrastructure. Using logical hierarchical definitions improves the reliability of the storage cluster, versus using the standard hierarchical definitions. Failure domains are defined in the CRUSH map. The default CRUSH map contains all nodes in a flat hierarchy. In a three data center environment, such as a stretch cluster, the placement of nodes should be managed in a way that one data center can go down, but the storage cluster stays up and running. Consider which failure domain a node resides in when using 3-way replication for the data.
In the example below, the resulting map is derived from the initial setup of the storage cluster with 6 OSD nodes. In this example, all nodes have only one disk and hence one OSD. All of the nodes are arranged under the default root, that is the standard root of the hierarchy tree. Because there is a weight assigned to two of the OSDs, these OSDs receive fewer chunks of data than the other OSDs. These nodes were introduced later with bigger disks than the initial OSD disks. This does not affect the data placement to withstand a failure of a group of nodes.
Example
Using logical hierarchical definitions to group the nodes into same data center can achieve data placement maturity. Possible definition types of root, datacenter, rack, row and host allow the reflection of the failure domains for the three data center stretch cluster:
- Nodes ceph-node1 and ceph-node2 reside in data center 1 (DC1)
- Nodes ceph-node3 and ceph-node5 reside in data center 2 (DC2)
- Nodes ceph-node4 and ceph-node6 reside in data center 3 (DC3)
- All data centers belong to the same structure (allDC)
Since all OSDs in a host belong to the host definition there is no change needed. All the other assignments can be adjusted during runtime of the storage cluster by:
Defining the bucket structure with the following commands:
ceph osd crush add-bucket allDC root ceph osd crush add-bucket DC1 datacenter ceph osd crush add-bucket DC2 datacenter ceph osd crush add-bucket DC3 datacenter
ceph osd crush add-bucket allDC root ceph osd crush add-bucket DC1 datacenter ceph osd crush add-bucket DC2 datacenter ceph osd crush add-bucket DC3 datacenter
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Moving the nodes into the appropriate place within this structure by modifying the CRUSH map:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Within this structure any new hosts can be added too, as well as new disks. By placing the OSDs at the right place in the hierarchy the CRUSH algorithm is changed to place redundant pieces into different failure domains within the structure.
The above example results in the following:
Example
The listing from above shows the resulting CRUSH map by displaying the osd tree. Easy to see is now how the hosts belong to a data center and all data centers belong to the same top level structure but clearly distinguishing between locations.
Placing the data in the proper locations according to the map works only properly within the healthy cluster. Misplacement might happen under circumstances, when some OSDs are not available. Those misplacements will be corrected automatically once it is possible to do so.
Additional Resources
- See the CRUSH administration chapter in the Red Hat Ceph Storage Storage Strategies Guide for more information.
Chapter 5. Migrating a non-containerized Red Hat Ceph Storage cluster to a containerized environment Copy linkLink copied to clipboard!
To manually migrate a non-containerized, bare-metal, Red Hat Ceph Storage cluster to a containerized environment, use the ceph-ansible switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook.
If the storage cluster has an RBD mirror daemon not deployed by ceph-ansible
, you need to migrate the daemons prior to converting to a containerized cluster. For more details, see Migrating RBD mirroring daemons.
Prerequisites
- A running Red Hat Ceph Storage non-containerized, bare-metal, cluster.
- Access to the Ansible administration node.
- An ansible user account.
- Sudo access to the ansible user account.
Procedure
Edit the
group_vars/all.yml
file to include configuration for containers:ceph_docker_image_tag: "latest" ceph_docker_image: rhceph/rhceph-4-rhel8 containerized_deployment: true ceph_docker_registry: registry.redhat.io
ceph_docker_image_tag: "latest" ceph_docker_image: rhceph/rhceph-4-rhel8 containerized_deployment: true ceph_docker_registry: registry.redhat.io
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantFor the
ceph_docker_image_tag
, uselatest
if your current storage cluster is on latest version or use the appropriate image tag. See the What are the Red Hat Ceph Storage releases and corresponding Ceph package versions? for more information.Navigate to the
/usr/share/ceph-ansible
directory:cd /usr/share/ceph-ansible
[ansible@admin ~]$ cd /usr/share/ceph-ansible
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the Ansible administration node, run the Ansible migration playbook:
Syntax
ansible-playbook ./infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml -i INVENTORY_FILE
ansible-playbook ./infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml -i INVENTORY_FILE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ansible-playbook ./infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml -i hosts
[ansible@admin ceph-ansible]$ ansible-playbook ./infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml -i hosts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the cluster is switched to containerized environment.
On the monitor node, list all running containers:
Red Hat Enterprise Linux 7
sudo docker ps
[root@mon ~]$ sudo docker ps
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
sudo podman ps
[root@mon ~]$ sudo podman ps
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Installing a Red Hat Ceph Storage cluster chapter in the Red Hat Ceph Storage Installation Guide for information on installation of a bare-metal storage cluster.
-
See the Creating an Ansible user with sudo access section in the Red Hat Ceph Storage Installation Guide for providing
sudo
access to the ansible user. - See the Configuring two-way mirroring using the command-line interface section in the Red Hat Ceph Storage Block Device Guide for more details.