Chapter 5. Troubleshooting Ceph OSDs

This chapter contains information on how to fix the most common errors related to Ceph OSDs.

Prerequisites

Verify your network connection. See Troubleshooting networking issues for details.
Verify that Monitors have a quorum by using the ceph health command. If the command returns a health status (HEALTH_OK, HEALTH_WARN, or HEALTH_ERR), the Monitors are able to form a quorum. If not, address any Monitor problems first. See Troubleshooting Ceph Monitors for details. For details about ceph health see Understanding Ceph health.
Optionally, stop the rebalancing process to save time and resources. See Stopping and starting rebalancing for details.

5.1. Most common Ceph OSD errors
Copy link

The following tables list the most common error messages that are returned by the ceph health detail command, or included in the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the problems.

Prerequisites

Root-level access to the Ceph OSD nodes.

5.1.1. Ceph OSD error messages
Copy link

A table of common Ceph OSD error messages, and a potential fix.

Expand

Error message	See
`HEALTH_ERR`
`full osds`	Full OSDs
`HEALTH_WARN`
`backfillfull osds`	Backfillfull OSDS
`nearfull osds`	Nearfull OSDs
`osds are down`	Down OSDs Flapping OSDs
`requests are blocked`	Slow request or requests are blocked
`slow requests`	Slow request or requests are blocked

5.1.2. Common Ceph OSD error messages in the Ceph logs
Copy link

A table of common Ceph OSD error messages found in the Ceph logs, and a link to a potential fix.

Expand

Error message	Log file	See
`heartbeat_check: no reply from osd.X`	Main cluster log	Flapping OSDs
`wrongly marked me down`	Main cluster log	Flapping OSDs
`osds have slow requests`	Main cluster log	Slow request or requests are blocked
`FAILED assert(0 == "hit suicide timeout")`	OSD log	Down OSDs

5.1.3. Full OSDs
Copy link

Understand and troubleshoot full OSDs.

When OSD nodes are considered full, the ceph health detail command returns an error message similar to the following example:

HEALTH_ERR 1 full osds
osd.3 is full at 95%

HEALTH_ERR 1 full osds
osd.3 is full at 95%

Copy to Clipboard

Toggle word wrap

What this means

Ceph prevents clients from running I/O operations on full OSD nodes to avoid losing data. It returns the HEALTH_ERR full osds message when the cluster reaches the capacity set by the mon_osd_full_ratio parameter. By default, this parameter is set to 0.95, which means 95% of the cluster capacity.

To troubleshoot this problem

To diagnose and resolve full OSDs, complete the following steps:

Run ceph osd df to check how full each OSD is. When invoked without arguments, it reports on the entire cluster. You can also filter by device class or a single OSD:
```
ceph osd df
ceph osd df ssd
ceph osd df osd.1701
```
```
ceph osd df
ceph osd df ssd
ceph osd df osd.1701
```
Copy to Clipboard Toggle word wrap
Review the %USE column to see how full each OSD is. The VAR column shows how each OSD compares to the average. The STDDEV value indicates how evenly OSDs are filled. If ceph osd df ssd or ceph osd df nvme reports a standard deviation greater than 2.0, the balancer might not be enabled or functioning correctly.
If ceph df shows that the device class of the full OSD is much less full on average than the full OSD, this is likely a balancer issue. For more information, see Using the Ceph Manager balancer module.
If OSDs within the device class are balanced, use the following approaches to reduce usage:
1. Check for benchmark detritus. If rados bench was run against this pool, orphaned data might remain. Check by running:
  rados -p mypoolname ls | grep -c bench
  Copy to Clipboard Toggle word wrap
  If this command returns a large number, consult Red Hat Support for safe removal by using the rados rm command.
2. Request that users remove temporary, outdated, or non-essential data.
3. Run fstrim on Ceph Block Device clients that use conventional file systems on volumes in a pool that uses the full OSD. This discards unused blocks and frees capacity.
4. Add nodes or OSDs of the appropriate device class so the balancer can redistribute data and reduce fullness.
As an emergency measure, you can temporarily raise the full ratio:
```
ceph osd set-full-ratio 0.98
```
```
ceph osd set-full-ratio 0.98
```
Copy to Clipboard Toggle word wrap
Important
Raising the full ratio is a temporary emergency measure. Restore the default setting as soon as possible to reduce the risk of cluster instability.

5.1.4. Backfillfull OSDs
Copy link

The ceph health detail command returns an error message similar to the following one:

health: HEALTH_WARN
3 backfillfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull

health: HEALTH_WARN
3 backfillfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull

Copy to Clipboard

Toggle word wrap

What this means

When one or more OSDs has exceeded the backfillfull threshold, Ceph prevents data from rebalancing to this device. This is an early warning that rebalancing might not complete and that the cluster is approaching full. The default for the backfullfull threshold is 90%.

To troubleshoot this problem

Check utilization by pool:

ceph df

ceph df

Copy to Clipboard

Toggle word wrap

If %RAW USED is above 70-75%, you can carry out one of the following actions:

Delete unnecessary data. This is a short-term solution to avoid production downtime.
Scale the cluster by adding a new OSD node. This is a long-term solution recommended by Red Hat.
Increase the backfillfull ratio for the OSDs that contain the PGs stuck in backfull_toofull to allow the recovery process to continue. Add new storage to the cluster as soon as possible or remove data to prevent filling more OSDs.
Syntax
```
ceph osd set-backfillfull-ratio VALUE
```
```
ceph osd set-backfillfull-ratio VALUE
```
Copy to Clipboard Toggle word wrap
The range for VALUE is 0.0 to 1.0.
Example
```
[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92
```
```
[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92
```
Copy to Clipboard Toggle word wrap

5.1.5. Nearfull OSDs
Copy link

The ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 1 nearfull osds
osd.2 is near full at 85%

HEALTH_WARN 1 nearfull osds
osd.2 is near full at 85%

Copy to Clipboard

Toggle word wrap

What This Means

Ceph returns the nearfull osds message when the cluster reaches the capacity set by the mon osd nearfull ratio defaults parameter. By default, this parameter is set to 0.85 which means 85% of the cluster capacity.

Ceph distributes data based on the CRUSH hierarchy in the best possible way but it cannot guarantee equal distribution. The main causes of the uneven data distribution and the nearfull osds messages are:

The OSDs are not balanced among the OSD nodes in the cluster. That is, some OSD nodes host significantly more OSDs than others, or the weight of some OSDs in the CRUSH map is not adequate to their capacity.
The Placement Group (PG) count is not proper as per the number of the OSDs, use case, target PGs per OSD, and OSD utilization.
The cluster uses inappropriate CRUSH tunables.
The back-end storage for OSDs is almost full.

To Troubleshoot This Problem:

Verify that the PG count is sufficient and increase it if needed.
Verify that you use CRUSH tunables optimal to the cluster version and adjust them if not.
Change the weight of OSDs by utilization.
Determine how much space is left on the disks used by OSDs.
1. To view how much space OSDs use in general:
  [ceph: root@host01 /]# ceph osd df
  Copy to Clipboard Toggle word wrap
2. To view how much space OSDs use on particular nodes. Use the following command from the node containing nearfull OSDs:
  df
  Copy to Clipboard Toggle word wrap
3. If needed, add a new OSD node.

5.1.6. Down OSDs
Copy link

The ceph health detail command returns an error similar to the following one:

HEALTH_WARN 1/3 in osds are down

HEALTH_WARN 1/3 in osds are down

Copy to Clipboard

Toggle word wrap

What This Means

One of the ceph-osd processes is unavailable due to a possible service failure or problems with communication with other OSDs. As a consequence, the surviving ceph-osd daemons reported this failure to the Monitors.

If the ceph-osd daemon is not running, the underlying OSD drive or file system is either corrupted, or some other error, such as a missing keyring, is preventing the daemon from starting.

In most cases, networking issues cause the situation when the ceph-osd daemon is running but still marked as down.

To Troubleshoot This Problem

Determine which OSD is down:

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 1/3 in osds are down
osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 1/3 in osds are down
osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

Copy to Clipboard

Toggle word wrap

Try to restart the ceph-osd daemon. Replace the OSD_ID with the ID of the OSD that is down:
Syntax
```
systemctl restart ceph-FSID@osd.OSD_ID
```
```
systemctl restart ceph-FSID@osd.OSD_ID
```
Copy to Clipboard Toggle word wrap
Example
```
systemctl restart ceph-b404c440-9e4c-11ec-a28a-001a4a0001df@osd.0.service
```
```
[root@host01 ~]# systemctl restart ceph-b404c440-9e4c-11ec-a28a-001a4a0001df@osd.0.service
```
Copy to Clipboard Toggle word wrap
1. If you are not able start ceph-osd, follow the steps in The ceph-osd daemon cannot start.
2. If you are able to start the ceph-osd daemon but it is marked as down, follow the steps in The ceph-osd daemon is running but still marked as `down`.

The ceph-osd daemon cannot start

If you have a node containing a number of OSDs (generally, more than twelve), verify that the default maximum number of threads (PID count) is sufficient. See Increasing the PID count for details.
Verify that the OSD data and journal partitions are mounted properly. You can use the ceph-volume lvm list command to list all devices and volumes associated with the Ceph Storage Cluster and then manually inspect if they are mounted properly. See the mount(8) manual page for details.
If you got the ERROR: missing keyring, cannot use cephx for authentication error message, the OSD is a missing keyring.
If you got the ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1 error message, the ceph-osd daemon cannot read the underlying file system. See the following steps for instructions on how to troubleshoot and fix this error.
1. Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the /var/log/ceph/CLUSTER_FSID/ directory after the logging to files is enabled.
2. An EIO error message indicates a failure of the underlying disk. To fix this problem replace the underlying OSD disk. See Replacing an OSD drive for details.
3. If the log includes any other FAILED assert errors, such as the following one, open a support ticket. See Contacting Red Hat Support for service for details.
  FAILED assert(0 == "hit suicide timeout")
  Copy to Clipboard Toggle word wrap
Check the dmesg output for the errors with the underlying file system or disk:
```
dmesg
```
```
dmesg
```
Copy to Clipboard Toggle word wrap
1. The error -5 error message similar to the following one indicates corruption of the underlying XFS file system. For details on how to fix this problem, see the What is the meaning of "xfs_log_force: error -5 returned"? solution on the Red Hat Customer Portal.
  xfs_log_force: error -5 returned
  Copy to Clipboard Toggle word wrap
2. If the dmesg output includes any SCSI error error messages, see the SCSI Error Codes Solution Finder solution on the Red Hat Customer Portal to determine the best way to fix the problem.
3. Alternatively, if you are unable to fix the underlying file system, replace the OSD drive. See Replacing an OSD drive for details.
If the OSD failed with a segmentation fault, such as the following one, gather the required information and open a support ticket. See Contacting Red Hat Support for service for details.
```
Caught signal (Segmentation fault)
```
```
Caught signal (Segmentation fault)
```
Copy to Clipboard Toggle word wrap

The ceph-osd is running but still marked as down

Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the /var/log/ceph/CLUSTER_FSID/ directory after the logging to files is enabled.
1. If the log includes error messages similar to the following ones, see Flapping OSDs.
  wrongly marked me down heartbeat_check: no reply from osd.2 since back
  Copy to Clipboard Toggle word wrap
2. If you see any other errors, open a support ticket. See Contacting Red Hat Support for service for details.

5.1.7. Flapping OSDs
Copy link

The ceph -w | grep osds command shows OSDs repeatedly as down and then up again within a short period of time:

ceph -w | grep osds
2022-05-05 06:27:20.810535 mon.0 [INF] osdmap e609: 9 osds: 8 up, 9 in
2022-05-05 06:27:24.120611 mon.0 [INF] osdmap e611: 9 osds: 7 up, 9 in
2022-05-05 06:27:25.975622 mon.0 [INF] HEALTH_WARN; 118 pgs stale; 2/9 in osds are down
2022-05-05 06:27:27.489790 mon.0 [INF] osdmap e614: 9 osds: 6 up, 9 in
2022-05-05 06:27:36.540000 mon.0 [INF] osdmap e616: 9 osds: 7 up, 9 in
2022-05-05 06:27:39.681913 mon.0 [INF] osdmap e618: 9 osds: 8 up, 9 in
2022-05-05 06:27:43.269401 mon.0 [INF] osdmap e620: 9 osds: 9 up, 9 in
2022-05-05 06:27:54.884426 mon.0 [INF] osdmap e622: 9 osds: 8 up, 9 in
2022-05-05 06:27:57.398706 mon.0 [INF] osdmap e624: 9 osds: 7 up, 9 in
2022-05-05 06:27:59.669841 mon.0 [INF] osdmap e625: 9 osds: 6 up, 9 in
2022-05-05 06:28:07.043677 mon.0 [INF] osdmap e628: 9 osds: 7 up, 9 in
2022-05-05 06:28:10.512331 mon.0 [INF] osdmap e630: 9 osds: 8 up, 9 in
2022-05-05 06:28:12.670923 mon.0 [INF] osdmap e631: 9 osds: 9 up, 9 in

ceph -w | grep osds
2022-05-05 06:27:20.810535 mon.0 [INF] osdmap e609: 9 osds: 8 up, 9 in
2022-05-05 06:27:24.120611 mon.0 [INF] osdmap e611: 9 osds: 7 up, 9 in
2022-05-05 06:27:25.975622 mon.0 [INF] HEALTH_WARN; 118 pgs stale; 2/9 in osds are down
2022-05-05 06:27:27.489790 mon.0 [INF] osdmap e614: 9 osds: 6 up, 9 in
2022-05-05 06:27:36.540000 mon.0 [INF] osdmap e616: 9 osds: 7 up, 9 in
2022-05-05 06:27:39.681913 mon.0 [INF] osdmap e618: 9 osds: 8 up, 9 in
2022-05-05 06:27:43.269401 mon.0 [INF] osdmap e620: 9 osds: 9 up, 9 in
2022-05-05 06:27:54.884426 mon.0 [INF] osdmap e622: 9 osds: 8 up, 9 in
2022-05-05 06:27:57.398706 mon.0 [INF] osdmap e624: 9 osds: 7 up, 9 in
2022-05-05 06:27:59.669841 mon.0 [INF] osdmap e625: 9 osds: 6 up, 9 in
2022-05-05 06:28:07.043677 mon.0 [INF] osdmap e628: 9 osds: 7 up, 9 in
2022-05-05 06:28:10.512331 mon.0 [INF] osdmap e630: 9 osds: 8 up, 9 in
2022-05-05 06:28:12.670923 mon.0 [INF] osdmap e631: 9 osds: 9 up, 9 in

Copy to Clipboard

Toggle word wrap

In addition the Ceph log contains error messages similar to the following ones:

2022-05-25 03:44:06.510583 osd.50 127.0.0.1:6801/149046 18992 : cluster [WRN] map e600547 wrongly marked me down

2022-05-25 03:44:06.510583 osd.50 127.0.0.1:6801/149046 18992 : cluster [WRN] map e600547 wrongly marked me down

Copy to Clipboard

Toggle word wrap

2022-05-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2021-07-25 19:00:07.444113 front 2021-07-25 18:59:48.311935 (cutoff 2021-07-25 18:59:48.906862)

2022-05-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2021-07-25 19:00:07.444113 front 2021-07-25 18:59:48.311935 (cutoff 2021-07-25 18:59:48.906862)

Copy to Clipboard

Toggle word wrap

What This Means

The main causes of flapping OSDs are:

Certain storage cluster operations, such as scrubbing or recovery, take an abnormal amount of time, for example, if you perform these operations on objects with a large index or large placement groups. Usually, after these operations finish, the flapping OSDs problem is solved.
Problems with the underlying physical hardware. In this case, the ceph health detail command also returns the slow requests error message.
Problems with the network.

Ceph OSDs cannot manage situations where the private network for the storage cluster fails, or significant latency is on the public client-facing network.

Ceph OSDs use the private network for sending heartbeat packets to each other to indicate that they are up and in. If the private storage cluster network does not work properly, OSDs are unable to send and receive the heartbeat packets. As a consequence, they report each other as being down to the Ceph Monitors, while marking themselves as up.

The following parameters in the Ceph configuration file influence this behavior:

Expand

Parameter	Description	Default value
`osd_heartbeat_grace_time`	How long OSDs wait for the heartbeat packets to return before reporting an OSD as `down` to the Ceph Monitors.	20 seconds
`mon_osd_min_down_reporters`	How many OSDs must report another OSD as `down` before the Ceph Monitors mark the OSD as `down`	2

This table shows that in the default configuration, the Ceph Monitors mark an OSD as down if only one OSD made three distinct reports about the first OSD being down. In some cases, if one single host encounters network issues, the entire cluster can experience flapping OSDs. This is because the OSDs that reside on the host will report other OSDs in the cluster as down.

Note

The flapping OSDs scenario does not include the situation when the OSD processes are started and then immediately killed.

To Troubleshoot This Problem

Check the output of the ceph health detail command again. If it includes the slow requests error message, see for details on how to troubleshoot this issue.

ceph health detail
HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

ceph health detail
HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

Copy to Clipboard

Toggle word wrap

Determine which OSDs are marked as down and on what nodes they reside:
```
ceph osd tree | grep down
```
```
ceph osd tree | grep down
```
Copy to Clipboard Toggle word wrap
On the nodes containing the flapping OSDs, troubleshoot and fix any networking problems.
Alternatively, you can temporarily force Monitors to stop marking the OSDs as down and up by setting the noup and nodown flags:
```
ceph osd set noup
ceph osd set nodown
```
```
ceph osd set noup
ceph osd set nodown
```
Copy to Clipboard Toggle word wrap
Important
Using the noup and nodown flags does not fix the root cause of the problem but only prevents OSDs from flapping. To open a support ticket, see the Contacting Red Hat Support for service section for details.

Important

Flapping OSDs can be caused by MTU misconfiguration on Ceph OSD nodes, at the network switch level, or both. To resolve the issue, set MTU to a uniform size on all storage cluster nodes, including on the core and access network switches with a planned downtime. Do not tune osd heartbeat min size because changing this setting can hide issues within the network, and it will not solve actual network inconsistency.

5.1.8. Slow requests or requests are blocked
Copy link

The ceph-osd daemon is slow to respond to a request and the ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests

Copy to Clipboard

Toggle word wrap

In addition, the Ceph logs include an error message similar to the following ones:

2022-05-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included below; oldest blocked for > 61.758455 secs

2022-05-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included below; oldest blocked for > 61.758455 secs

Copy to Clipboard

Toggle word wrap

2022-05-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]

2022-05-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]

Copy to Clipboard

Toggle word wrap

What This Means

An OSD with slow requests is every OSD that is not able to service the I/O operations per second (IOPS) in the queue within the time defined by the osd_op_complaint_time parameter. By default, this parameter is set to 30 seconds.

The main causes of OSDs having slow requests are:

Problems with the underlying hardware, such as disk drives, hosts, racks, or network switches
Problems with the network. These problems are usually connected with flapping OSDs. See Flapping OSDs for details.
System load

The following table shows the types of slow requests. Use the dump_historic_ops administration socket command to determine the type of a slow request. For details about the administration socket, see the Using the Ceph Administration Socket section in the Administration Guide for Red Hat Ceph Storage 7.

Expand

Slow request type	Description
`waiting for rw locks`	The OSD is waiting to acquire a lock on a placement group for the operation.
`waiting for subops`	The OSD is waiting for replica OSDs to apply the operation to the journal.
`no flag points reached`	The OSD did not reach any major operation milestone.
`waiting for degraded object`	The OSDs have not replicated an object the specified number of times yet.

To Troubleshoot This Problem

Determine if the OSDs with slow or block requests share a common piece of hardware, for example, a disk drive, host, rack, or network switch.
If the OSDs share a disk:
1. Use the smartmontools utility to check the health of the disk or the logs to determine any errors on the disk.
  Note
  The smartmontools utility is included in the smartmontools package.
2. Use the iostat utility to get the I/O wait report (%iowai) on the OSD disk to determine if the disk is under heavy load.
  Note
  The iostat utility is included in the sysstat package.
If the OSDs share the node with another service:
1. Check the RAM and CPU utilization
2. Use the netstat utility to see the network statistics on the Network Interface Controllers (NICs) and troubleshoot any networking issues.
If the OSDs share a rack, check the network switch for the rack. For example, if you use jumbo frames, verify that the NIC in the path has jumbo frames set.
If you are unable to determine a common piece of hardware shared by OSDs with slow requests, or to troubleshoot and fix hardware and networking problems, open a support ticket. See Contacting Red Hat support for service for details.

5.2. Stopping and starting rebalancing
Copy link

When an OSD fails or you stop it, the CRUSH algorithm automatically starts the rebalancing process to redistribute data across the remaining OSDs.

Rebalancing can take time and resources, therefore, consider stopping rebalancing during troubleshooting or maintaining OSDs.

Note

Placement groups within the stopped OSDs become degraded during troubleshooting and maintenance.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

Log in to the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap
Set the noout flag before stopping the OSD:
Example
```
[ceph: root@host01 /]# ceph osd set noout
```
```
[ceph: root@host01 /]# ceph osd set noout
```
Copy to Clipboard Toggle word wrap
When you finish troubleshooting or maintenance, unset the noout flag to start rebalancing:
Example
```
[ceph: root@host01 /]# ceph osd unset noout
```
```
[ceph: root@host01 /]# ceph osd unset noout
```
Copy to Clipboard Toggle word wrap

5.3. Replacing an OSD drive
Copy link

Ceph is designed for fault tolerance, which means that it can operate in a degraded state without losing data. Consequently, Ceph can operate even if a data storage drive fails. In the context of a failed drive, the degraded state means that the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the cluster. However, if this occurs, replace the failed OSD drive and recreate the OSD manually.

When a drive fails, Ceph reports the OSD as down:

HEALTH_WARN 1/3 in osds are down
osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

HEALTH_WARN 1/3 in osds are down
osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080

Copy to Clipboard

Toggle word wrap

Note

Ceph can mark an OSD as down also as a consequence of networking or permissions problems. See Down OSDs for details.

Modern servers typically deploy with hot-swappable drives so you can pull a failed drive and replace it with a new one without bringing down the node. The whole procedure includes these steps:

Remove the OSD from the Ceph cluster. For details, see the Removing an OSD from the Ceph Cluster procedure.
Replace the drive. For details, see Replacing the physical drive section.
Add the OSD to the cluster. For details, see Adding an OSD to the Ceph Cluster procedure.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to the Ceph Monitor node.
At least one OSD is down.

Removing an OSD from the Ceph Cluster

Log into the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap

Determine which OSD is down.

Example

[ceph: root@host01 /]# ceph osd tree | grep -i down
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
 0   hdd 0.00999        osd.0     down  1.00000          1.00000

[ceph: root@host01 /]# ceph osd tree | grep -i down
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
 0   hdd 0.00999        osd.0     down  1.00000          1.00000

Copy to Clipboard

Toggle word wrap

Mark the OSD as out for the cluster to rebalance and copy its data to other OSDs.
Syntax
```
ceph osd out OSD_ID.
```
```
ceph osd out OSD_ID.
```
Copy to Clipboard Toggle word wrap
Example
```
[ceph: root@host01 /]# ceph osd out osd.0
marked out osd.0.
```
```
[ceph: root@host01 /]# ceph osd out osd.0
marked out osd.0.
```
Copy to Clipboard Toggle word wrap
Note
If the OSD is down, Ceph marks it as out automatically after 600 seconds when it does not receive any heartbeat packet from the OSD based on the mon_osd_down_out_interval parameter. When this happens, other OSDs with copies of the failed OSD data begin backfilling to ensure that the required number of copies exists within the cluster. While the cluster is backfilling, the cluster will be in a degraded state.

Ensure that the failed OSD is backfilling.

Example

[ceph: root@host01 /]# ceph -w | grep backfill
2022-05-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s rd, 1358 B/s wr, 12 op/s; 10626/35917 objects degraded (29.585%); 6757/35917 objects misplaced (18.813%); 63500 kB/s, 15 objects/s recovering
2022-05-02 04:48:04.414397 mon.0 [INF] pgmap v10293283: 431 pgs: 2 active+undersized+degraded+remapped+backfilling, 75 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 295 active+clean; 72347 MB data, 101398 MB used, 1623 GB / 1722 GB avail; 969 kB/s rd, 6778 B/s wr, 32 op/s; 10626/35917 objects degraded (29.585%); 10580/35917 objects misplaced (29.457%); 125 MB/s, 31 objects/s recovering
2022-05-02 04:48:00.380063 osd.1 [INF] 0.6f starting backfill to osd.0 from (0'0,0'0] MAX to 2521'166639
2022-05-02 04:48:00.380139 osd.1 [INF] 0.48 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'43079
2022-05-02 04:48:00.380260 osd.1 [INF] 0.d starting backfill to osd.0 from (0'0,0'0] MAX to 2513'136847
2022-05-02 04:48:00.380849 osd.1 [INF] 0.71 starting backfill to osd.0 from (0'0,0'0] MAX to 2331'28496
2022-05-02 04:48:00.381027 osd.1 [INF] 0.51 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'87544

[ceph: root@host01 /]# ceph -w | grep backfill
2022-05-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s rd, 1358 B/s wr, 12 op/s; 10626/35917 objects degraded (29.585%); 6757/35917 objects misplaced (18.813%); 63500 kB/s, 15 objects/s recovering
2022-05-02 04:48:04.414397 mon.0 [INF] pgmap v10293283: 431 pgs: 2 active+undersized+degraded+remapped+backfilling, 75 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 295 active+clean; 72347 MB data, 101398 MB used, 1623 GB / 1722 GB avail; 969 kB/s rd, 6778 B/s wr, 32 op/s; 10626/35917 objects degraded (29.585%); 10580/35917 objects misplaced (29.457%); 125 MB/s, 31 objects/s recovering
2022-05-02 04:48:00.380063 osd.1 [INF] 0.6f starting backfill to osd.0 from (0'0,0'0] MAX to 2521'166639
2022-05-02 04:48:00.380139 osd.1 [INF] 0.48 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'43079
2022-05-02 04:48:00.380260 osd.1 [INF] 0.d starting backfill to osd.0 from (0'0,0'0] MAX to 2513'136847
2022-05-02 04:48:00.380849 osd.1 [INF] 0.71 starting backfill to osd.0 from (0'0,0'0] MAX to 2331'28496
2022-05-02 04:48:00.381027 osd.1 [INF] 0.51 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'87544

Copy to Clipboard

Toggle word wrap

You should see the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes.

Stop the OSD:

Syntax

ceph orch daemon stop OSD_ID

ceph orch daemon stop OSD_ID

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# ceph orch daemon stop osd.0

[ceph: root@host01 /]# ceph orch daemon stop osd.0

Copy to Clipboard

Toggle word wrap

Remove the OSD from the storage cluster:
Syntax
```
ceph orch osd rm OSD_ID --replace
```
```
ceph orch osd rm OSD_ID --replace
```
Copy to Clipboard Toggle word wrap
Example
```
[ceph: root@host01 /]# ceph orch osd rm 0 --replace
```
```
[ceph: root@host01 /]# ceph orch osd rm 0 --replace
```
Copy to Clipboard Toggle word wrap
The OSD_ID is preserved.

Replacing the physical drive

See the documentation for the hardware node for details on replacing the physical drive.

If the drive is hot-swappable, replace the failed drive with a new one.
If the drive is not hot-swappable and the node contains multiple OSDs, you might have to shut down the whole node and replace the physical drive. Consider preventing the cluster from backfilling. See the Stopping and Starting Rebalancing chapter in the Red Hat Ceph Storage Troubleshooting Guide for details.
When the drive appears under the /dev/ directory, make a note of the drive path.
If you want to add the OSD manually, find the OSD drive and format the disk.

Adding an OSD to the Ceph Cluster

Once the new drive is inserted, you can use the following options to deploy the OSDs:
- The OSDs are deployed automatically by the Ceph Orchestrator if the --unmanaged parameter is not set.
  Example
  [ceph: root@host01 /]# ceph orch apply osd --all-available-devices
  
  Copy to Clipboard Toggle word wrap
- Deploy the OSDs on all the available devices with the unmanaged parameter set to true.
  Example
  [ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true
  
  Copy to Clipboard Toggle word wrap
- Deploy the OSDs on specific devices and hosts.
  Example
  [ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb
  
  Copy to Clipboard Toggle word wrap
Ensure that the CRUSH hierarchy is accurate:
Example
```
[ceph: root@host01 /]# ceph osd tree
```
```
[ceph: root@host01 /]# ceph osd tree
```
Copy to Clipboard Toggle word wrap

5.4. Increasing the PID count
Copy link

If you have a node containing more than 12 Ceph OSDs, the default maximum number of threads (PID count) can be insufficient, especially during recovery. As a consequence, some ceph-osd daemons can terminate and fail to start again. If this happens, increase the maximum possible number of threads allowed.

Procedure

To temporary increase the number:

sysctl -w kernel.pid.max=4194303

[root@mon ~]# sysctl -w kernel.pid.max=4194303

Copy to Clipboard

Toggle word wrap

To permanently increase the number, update the /etc/sysctl.conf file as follows:

kernel.pid.max = 4194303

kernel.pid.max = 4194303

Copy to Clipboard

Toggle word wrap

5.5. Deleting data from a full storage cluster
Copy link

Ceph automatically prevents any I/O operations on OSDs that reached the capacity specified by the mon_osd_full_ratio parameter and returns the full osds error message.

This procedure shows how to delete unnecessary data to fix this error.

Note

The mon_osd_full_ratio parameter sets the value of the full_ratio parameter when creating a cluster. You cannot change the value of mon_osd_full_ratio afterward. To temporarily increase the full_ratio value, increase the set-full-ratio instead.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

Log in to the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap

Determine the current value of full_ratio, by default it is set to 0.95:

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.95

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.95

Copy to Clipboard

Toggle word wrap

Temporarily increase the value of set-full-ratio to 0.97:
```
[ceph: root@host01 /]# ceph osd set-full-ratio 0.97
```
```
[ceph: root@host01 /]# ceph osd set-full-ratio 0.97
```
Copy to Clipboard Toggle word wrap
Important
Red Hat strongly recommends to not set the set-full-ratio to a value higher than 0.97. Setting this parameter to a higher value makes the recovery process harder. As a consequence, you might not be able to recover full OSDs at all.

Verify that you successfully set the parameter to 0.97:

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.97

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.97

Copy to Clipboard

Toggle word wrap

Monitor the cluster state:
```
[ceph: root@host01 /]# ceph -w
```
```
[ceph: root@host01 /]# ceph -w
```
Copy to Clipboard Toggle word wrap
As soon as the cluster changes its state from full to nearfull, delete any unnecessary data.

Set the value of full_ratio back to 0.95:

[ceph: root@host01 /]# ceph osd set-full-ratio 0.95

[ceph: root@host01 /]# ceph osd set-full-ratio 0.95

Copy to Clipboard

Toggle word wrap

Verify that you successfully set the parameter to 0.95:

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.95

[ceph: root@host01 /]# ceph osd dump | grep -i full
full_ratio 0.95

Copy to Clipboard

Toggle word wrap

Chapter 5. Troubleshooting Ceph OSDs

5.1. Most common Ceph OSD errors
Copy link

5.1.1. Ceph OSD error messages
Copy link

5.1.2. Common Ceph OSD error messages in the Ceph logs
Copy link

5.1.3. Full OSDs
Copy link

5.1.4. Backfillfull OSDs
Copy link

5.1.5. Nearfull OSDs
Copy link

5.1.6. Down OSDs
Copy link

5.1.7. Flapping OSDs
Copy link

5.1.8. Slow requests or requests are blocked
Copy link

5.2. Stopping and starting rebalancing
Copy link

5.3. Replacing an OSD drive
Copy link

5.4. Increasing the PID count
Copy link

5.5. Deleting data from a full storage cluster
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 5. Troubleshooting Ceph OSDs

5.1. Most common Ceph OSD errorsCopy linkLink copied to clipboard!

5.1.1. Ceph OSD error messagesCopy linkLink copied to clipboard!

5.1.2. Common Ceph OSD error messages in the Ceph logsCopy linkLink copied to clipboard!

5.1.3. Full OSDsCopy linkLink copied to clipboard!

5.1.4. Backfillfull OSDsCopy linkLink copied to clipboard!

5.1.5. Nearfull OSDsCopy linkLink copied to clipboard!

5.1.6. Down OSDsCopy linkLink copied to clipboard!

5.1.7. Flapping OSDsCopy linkLink copied to clipboard!

5.1.8. Slow requests or requests are blockedCopy linkLink copied to clipboard!

5.2. Stopping and starting rebalancingCopy linkLink copied to clipboard!

5.3. Replacing an OSD driveCopy linkLink copied to clipboard!

5.4. Increasing the PID countCopy linkLink copied to clipboard!

5.5. Deleting data from a full storage clusterCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. Most common Ceph OSD errors
Copy link

5.1.1. Ceph OSD error messages
Copy link

5.1.2. Common Ceph OSD error messages in the Ceph logs
Copy link

5.1.3. Full OSDs
Copy link

5.1.4. Backfillfull OSDs
Copy link

5.1.5. Nearfull OSDs
Copy link

5.1.6. Down OSDs
Copy link

5.1.7. Flapping OSDs
Copy link

5.1.8. Slow requests or requests are blocked
Copy link

5.2. Stopping and starting rebalancing
Copy link

5.3. Replacing an OSD drive
Copy link

5.4. Increasing the PID count
Copy link

5.5. Deleting data from a full storage cluster
Copy link