Home
Products
Red Hat Ceph Storage
8
Operations Guide
Chapter 12. Handling a node failure

Chapter 12. Handling a node failure

As a storage administrator, you can experience a whole node failing within the storage cluster, and handling a node failure is similar to handling a disk failure. With a node failure, instead of Ceph recovering placement groups (PGs) for only one disk, all PGs on the disks within that node must be recovered. Ceph will detect that the OSDs are all down and automatically start the recovery process, known as self-healing.

There are three node failure scenarios.

Replacing the node by using the root and Ceph OSD disks from the failed node.
Replacing the node by reinstalling the operating system and using the Ceph OSD disks from the failed node.
Replacing the node by reinstalling the operating system and using all new Ceph OSD disks.

For a high-level workflow for each node replacement scenario, see link:https://docs.redhat.com/en/documentation/red_hat_ceph_storage/8/html-single/operations_guide/#ops_workflow-for replacing-a-node[Workflow for replacing a node].

Prerequisites

A running Red Hat Ceph Storage cluster.
A failed node.

12.1. Considerations before adding or removing a node
Copy link

One of the outstanding features of Ceph is the ability to add or remove Ceph OSD nodes at run time. This means that you can resize the storage cluster capacity or replace hardware without taking down the storage cluster.

The ability to serve Ceph clients while the storage cluster is in a degraded state also has operational benefits. For example, you can add or remove or replace hardware during regular business hours, rather than working overtime or on weekends. However, adding and removing Ceph OSD nodes can have a significant impact on performance.

Before you add or remove Ceph OSD nodes, consider the effects on storage cluster performance:

Whether you are expanding or reducing the storage cluster capacity, adding or removing Ceph OSD nodes induces backfilling as the storage cluster rebalances. During that rebalancing time period, Ceph uses additional resources, which can impact storage cluster performance.
In a production Ceph storage cluster, a Ceph OSD node has a particular hardware configuration that facilitates a particular type of storage strategy.
Since a Ceph OSD node is part of a CRUSH hierarchy, the performance impact of adding or removing a node typically affects the performance of pools that use the CRUSH ruleset.

Important

In director-deployed Red Hat Ceph Storage environments, replacing Controller nodes can affect the Ceph Monitor service and lead to storage outages if Monitor IP addresses change. For more information, see Managing the Ceph Monitor service with Red Hat OpenStack Platform.

12.2. Workflow for replacing a node
Copy link

There are three node failure scenarios. Use these high-level workflows for each scenario when replacing a node.

Prerequisites

A running Red Hat Ceph Storage cluster.
A failed node.

12.2.1. Replacing the node by using the root and Ceph OSD disks from the failed node
Copy link

Use the root and Ceph OSD disks from the failed node to replace the node.

Procedure

Disable backfilling.

Syntax

ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd set noout
[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Replace the node, taking the disks from the old node, and adding them to the new node.

Enable backfilling.

Syntax

ceph osd unset noout
ceph osd unset noscrub
ceph osd unset nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd unset noout
[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

12.2.2. Replacing the node by reinstalling the operating system and using the Ceph OSD disks from the failed node
Copy link

Reinstall the operating system and use the Ceph OSD disks from the failed node to replace the node.

Procedure

Disable backfilling.

Syntax

ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd set noout
[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Create a backup of the Ceph configuration.

Syntax

cp /etc/ceph/ceph.conf /PATH_TO_BACKUP_LOCATION/ceph.conf

Example

[ceph: root@host01 /]# cp /etc/ceph/ceph.conf /some/backup/location/ceph.conf

Replace the node and add the Ceph OSD disks from the failed node.
Configure disks as JBOD.
Note
This should be done by the storage administrator.
Install the operating system. For more information about operating system requirements, see Operating system requirements for Red Hat Ceph Storage. For more information about installing the operating system, see the Red Hat Enterprise Linux product documentation.
Note
This should be done by the system administrator.

Restore the Ceph configuration.

Syntax

cp /PATH_TO_BACKUP_LOCATION/ceph.conf /etc/ceph/ceph.conf

Example

[ceph: root@host01 /]# cp /some/backup/location/ceph.conf /etc/ceph/ceph.conf

Add the new node to the storage cluster using the Ceph Orchestrator commands. Ceph daemons are placed automatically on the respective node. For more information, see Adding a Ceph OSD node.

Enable backfilling.

Syntax

ceph osd unset noout
ceph osd unset noscrub
ceph osd unset nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd unset noout
[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

12.2.3. Replacing the node by reinstalling the operating system and using all new Ceph OSD disks
Copy link

Reinstall the operating system and use all new Ceph OSD disks to replace the node.

Procedure

Disable backfilling.

Syntax

ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd set noout
[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Remove all OSDs on the failed node from the storage cluster. For more information, see Removing a Ceph OSD node.

Create a backup of the Ceph configuration.

Syntax

cp /etc/ceph/ceph.conf /PATH_TO_BACKUP_LOCATION/ceph.conf

Example

[ceph: root@host01 /]# cp /etc/ceph/ceph.conf /some/backup/location/ceph.conf

Replace the node and add the Ceph OSD disks from the failed node.
Configure disks as JBOD.
Note
This should be done by the storage administrator.
Install the operating system. For more information about operating system requirements, see Operating system requirements for Red Hat Ceph Storage. For more information about installing the operating system, see the Red Hat Enterprise Linux product documentation.
Note
This should be done by the system administrator.
Add the new node to the storage cluster using the Ceph Orchestrator commands. Ceph daemons are placed automatically on the respective node. For more information, see Adding a Ceph OSD node.

Enable backfilling.

Syntax

ceph osd unset noout
ceph osd unset noscrub
ceph osd unset nodeep-scrub

Example

[ceph: root@host01 /]# ceph osd unset noout
[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

12.3. Performance considerations
Copy link

The following factors typically affect a storage cluster’s performance when adding or removing Ceph OSD nodes:

Ceph clients place load on the I/O interface to Ceph; that is, the clients place load on a pool. A pool maps to a CRUSH ruleset. The underlying CRUSH hierarchy allows Ceph to place data across failure domains. If the underlying Ceph OSD node involves a pool that is experiencing high client load, the client load could significantly affect recovery time and reduce performance. Because write operations require data replication for durability, write-intensive client loads in particular can increase the time for the storage cluster to recover.
Generally, the capacity you are adding or removing affects the storage cluster’s time to recover. In addition, the storage density of the node you add or remove might also affect recovery times. For example, a node with 36 OSDs typically takes longer to recover than a node with 12 OSDs.
When removing nodes, you MUST ensure that you have sufficient spare capacity so that you will not reach full ratio or near full ratio. If the storage cluster reaches full ratio, Ceph will suspend write operations to prevent data loss.
A Ceph OSD node maps to at least one Ceph CRUSH hierarchy, and the hierarchy maps to at least one pool. Each pool that uses a CRUSH ruleset experiences a performance impact when Ceph OSD nodes are added or removed.
Replication pools tend to use more network bandwidth to replicate deep copies of the data, whereas erasure coded pools tend to use more CPU to calculate k+m coding chunks. The more copies that exist of the data, the longer it takes for the storage cluster to recover. For example, a larger pool or one that has a greater number of k+m chunks will take longer to recover than a replication pool with fewer copies of the same data.
Drives, controllers and network interface cards all have throughput characteristics that might impact the recovery time. Generally, nodes with higher throughput characteristics, such as 10 Gbps and SSDs, recover more quickly than nodes with lower throughput characteristics, such as 1 Gbps and SATA drives.

12.4. Recommendations for adding or removing nodes
Copy link

Red Hat recommends adding or removing one OSD at a time within a node and allowing the storage cluster to recover before proceeding to the next OSD. This helps to minimize the impact on storage cluster performance. Note that if a node fails, you might need to change the entire node at once, rather than one OSD at a time.

To remove an OSD:

Using Removing the OSD daemons using the Ceph Orchestrator.

To add an OSD:

When adding or removing Ceph OSD nodes, consider that other ongoing processes also affect storage cluster performance. To reduce the impact on client I/O, Red Hat recommends the following:

Calculate capacity

Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all its OSDs without reaching the full ratio. Reaching the full ratio will cause the storage cluster to refuse write operations.

Temporarily disable scrubbing

Scrubbing is essential to ensuring the durability of the storage cluster’s data; however, it is resource intensive. Before adding or removing a Ceph OSD node, disable scrubbing and deep-scrubbing and let the current scrubbing operations complete before proceeding.

ceph osd set noscrub
ceph osd set nodeep-scrub

Once you have added or removed a Ceph OSD node and the storage cluster has returned to an active+clean state, unset the noscrub and nodeep-scrub settings.

ceph osd unset noscrub
ceph osd unset nodeep-scrub

Limit backfill and recovery

If you have reasonable data durability, there is nothing wrong with operating in a degraded state. For example, you can operate the storage cluster with osd_pool_default_size = 3 and osd_pool_default_min_size = 2. You can tune the storage cluster for the fastest possible recovery time, but doing so significantly affects Ceph client I/O performance. To maintain the highest Ceph client I/O performance, limit the backfill and recovery operations and allow them to take longer.

osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

You can also consider setting the sleep and delay parameters such as, osd_recovery_sleep.

Increase the number of placement groups

Finally, if you are expanding the size of the storage cluster, you may need to increase the number of placement groups. If you determine that you need to expand the number of placement groups, Red Hat recommends making incremental increases in the number of placement groups. Increasing the number of placement groups by a significant amount will cause a considerable degradation in performance.

12.5. Adding a Ceph OSD node
Copy link

To expand the capacity of the Red Hat Ceph Storage cluster, you can add an OSD node.

Prerequisites

A running Red Hat Ceph Storage cluster.
A provisioned node with a network connection.

Procedure

Verify that other nodes in the storage cluster can reach the new node by its short host name.

Temporarily disable scrubbing:

Example

[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Limit the backfill and recovery features:

Syntax

ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]

Example

[ceph: root@host01 /]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1

Extract the cluster’s public SSH keys to a folder:

Syntax

ceph cephadm get-pub-key > ~/PATH

Example

[ceph: root@host01 /]# ceph cephadm get-pub-key > ~/ceph.pub

Copy ceph cluster’s public SSH keys to the root user’s authorized_keys file on the new host:
Syntax
```
ssh-copy-id -f -i ~/PATH root@HOST_NAME_2
```
Example
```
[ceph: root@host01 /]# ssh-copy-id -f -i ~/ceph.pub root@host02
```

Add the new node to the CRUSH map:

Syntax

ceph orch host add NODE_NAME IP_ADDRESS

Example

[ceph: root@host01 /]# ceph orch host add host02 10.10.128.70

Add an OSD for each disk on the node to the storage cluster.

Important

When adding an OSD node to a Red Hat Ceph Storage cluster, Red Hat recommends adding one OSD daemon at a time and allowing the cluster to recover to an active+clean state before proceeding to the next OSD.

12.6. Removing a Ceph OSD node
Copy link

To reduce the capacity of a storage cluster, remove an OSD node.

Warning

Before removing a Ceph OSD node, ensure that the storage cluster can backfill the contents of all OSDs without reaching the full ratio. Reaching the full ratio will cause the storage cluster to refuse write operations.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to all nodes in the storage cluster.

Procedure

Check the storage cluster’s capacity:
Syntax
```
ceph df
rados df
ceph osd df
```

Temporarily disable scrubbing:

Syntax

ceph osd set noscrub
ceph osd set nodeep-scrub

Limit the backfill and recovery features:

Syntax

ceph tell DAEMON_TYPE.* injectargs --OPTION_NAME VALUE [--OPTION_NAME VALUE]

Example

[ceph: root@host01 /]# ceph tell osd.* injectargs --osd-max-backfills 1 --osd-recovery-max-active 1 --osd-recovery-op-priority 1

Remove each OSD on the node from the storage cluster:
- Using Removing the OSD daemons using the Ceph Orchestrator.
  Important
  When removing an OSD node from the storage cluster, Red Hat recommends removing one OSD at a time within the node and allowing the cluster to recover to an active+clean state before proceeding to remove the next OSD.
  1. After you remove an OSD, check to verify that the storage cluster is not getting to the near-full ratio:
    Syntax
    
    ceph -s ceph df
  2. Repeat this step until all OSDs on the node are removed from the storage cluster.
Once all OSDs are removed, remove the host:
- Using Removing hosts using the Ceph Orchestrator.

12.7. Simulating a node failure
Copy link

To simulate a hard node failure, power off the node and reinstall the operating system.

Prerequisites

A healthy running Red Hat Ceph Storage cluster.
Root-level access to all nodes on the storage cluster.

Procedure

Check the storage cluster’s capacity to understand the impact of removing the node:

Example

[ceph: root@host01 /]# ceph df
[ceph: root@host01 /]# rados df
[ceph: root@host01 /]# ceph osd df

Optionally, disable recovery and backfilling:

Example

[ceph: root@host01 /]# ceph osd set noout
[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Shut down the node.
If you are changing the host name, remove the node from CRUSH map:
Example
```
[ceph: root@host01 /]# ceph osd crush rm host03
```
Check the status of the storage cluster:
Example
```
[ceph: root@host01 /]# ceph -s
```
Reinstall the operating system on the node.
Add the new node:
- Using the Adding hosts using the Ceph Orchestrator.

Optionally, enable recovery and backfilling:

Example

[ceph: root@host01 /]# ceph osd unset noout
[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

Check Ceph’s health:
Example
```
[ceph: root@host01 /]# ceph -s
```

Chapter 12. Handling a node failure

12.1. Considerations before adding or removing a node
Copy link

12.2. Workflow for replacing a node
Copy link

12.2.1. Replacing the node by using the root and Ceph OSD disks from the failed node
Copy link

12.2.2. Replacing the node by reinstalling the operating system and using the Ceph OSD disks from the failed node
Copy link

12.2.3. Replacing the node by reinstalling the operating system and using all new Ceph OSD disks
Copy link

12.3. Performance considerations
Copy link

12.4. Recommendations for adding or removing nodes
Copy link

12.5. Adding a Ceph OSD node
Copy link

12.6. Removing a Ceph OSD node
Copy link

12.7. Simulating a node failure
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 12. Handling a node failure

12.1. Considerations before adding or removing a nodeCopy linkLink copied to clipboard!

12.2. Workflow for replacing a nodeCopy linkLink copied to clipboard!

12.2.1. Replacing the node by using the root and Ceph OSD disks from the failed nodeCopy linkLink copied to clipboard!

12.2.2. Replacing the node by reinstalling the operating system and using the Ceph OSD disks from the failed nodeCopy linkLink copied to clipboard!

12.2.3. Replacing the node by reinstalling the operating system and using all new Ceph OSD disksCopy linkLink copied to clipboard!

12.3. Performance considerationsCopy linkLink copied to clipboard!

12.4. Recommendations for adding or removing nodesCopy linkLink copied to clipboard!

12.5. Adding a Ceph OSD nodeCopy linkLink copied to clipboard!

12.6. Removing a Ceph OSD nodeCopy linkLink copied to clipboard!

12.7. Simulating a node failureCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

12.1. Considerations before adding or removing a node
Copy link

12.2. Workflow for replacing a node
Copy link

12.2.1. Replacing the node by using the root and Ceph OSD disks from the failed node
Copy link

12.2.2. Replacing the node by reinstalling the operating system and using the Ceph OSD disks from the failed node
Copy link

12.2.3. Replacing the node by reinstalling the operating system and using all new Ceph OSD disks
Copy link

12.3. Performance considerations
Copy link

12.4. Recommendations for adding or removing nodes
Copy link

12.5. Adding a Ceph OSD node
Copy link

12.6. Removing a Ceph OSD node
Copy link

12.7. Simulating a node failure
Copy link