Chapter 5. Troubleshooting Ceph OSDs
This chapter contains information on how to fix the most common errors related to Ceph OSDs.
5.1. Prerequisites
- Verify your network connection. See Troubleshooting networking issues for details.
-
Verify that Monitors have a quorum by using the
ceph health
command. If the command returns a health status (HEALTH_OK
,HEALTH_WARN
, orHEALTH_ERR
), the Monitors are able to form a quorum. If not, address any Monitor problems first. See Troubleshooting Ceph Monitors for details. For details aboutceph health
see Understanding Ceph health. - Optionally, stop the rebalancing process to save time and resources. See Stopping and starting rebalancing for details.
5.2. Most common Ceph OSD errors
The following tables list the most common error messages that are returned by the ceph health detail
command, or included in the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the problems.
5.2.1. Prerequisites
- Root-level access to the Ceph OSD nodes.
5.2.2. Ceph OSD error messages
A table of common Ceph OSD error messages, and a potential fix.
Error message | See |
---|---|
| |
| |
| |
| |
| |
| |
| |
|
5.2.3. Common Ceph OSD error messages in the Ceph logs
A table of common Ceph OSD error messages found in the Ceph logs, and a link to a potential fix.
Error message | Log file | See |
---|---|---|
| Main cluster log | |
| Main cluster log | |
| Main cluster log | |
| OSD log |
5.2.4. Full OSDs
The ceph health detail
command returns an error message similar to the following one:
HEALTH_ERR 1 full osds osd.3 is full at 95%
What This Means
Ceph prevents clients from performing I/O operations on full OSD nodes to avoid losing data. It returns the HEALTH_ERR full osds
message when the cluster reaches the capacity set by the mon_osd_full_ratio
parameter. By default, this parameter is set to 0.95
which means 95% of the cluster capacity.
To Troubleshoot This Problem
Determine how many percent of raw storage (%RAW USED
) is used:
# ceph df
If %RAW USED
is above 70-75%, you can:
- Delete unnecessary data. This is a short-term solution to avoid production downtime.
- Scale the cluster by adding a new OSD node. This is a long-term solution recommended by Red Hat.
Additional Resources
- Nearfull OSDS in the Red Hat Ceph Storage Troubleshooting Guide.
- See Deleting data from a full storage cluster for details.
5.2.5. Backfillfull OSDs
The ceph health detail
command returns an error message similar to the following one:
health: HEALTH_WARN 3 backfillfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 32 pgs backfill_toofull
What this means
When one or more OSDs has exceeded the backfillfull threshold, Ceph prevents data from rebalancing to this device. This is an early warning that rebalancing might not complete and that the cluster is approaching full. The default for the backfullfull threshold is 90%.
To troubleshoot this problem
Check utilization by pool:
ceph df
If %RAW USED
is above 70-75%, you can carry out one of the following actions:
- Delete unnecessary data. This is a short-term solution to avoid production downtime.
- Scale the cluster by adding a new OSD node. This is a long-term solution recommended by Red Hat.
Increase the
backfillfull
ratio for the OSDs that contain the PGs stuck inbackfull_toofull
to allow the recovery process to continue. Add new storage to the cluster as soon as possible or remove data to prevent filling more OSDs.Syntax
ceph osd set-backfillfull-ratio VALUE
The range for VALUE is 0.0 to 1.0.
Example
[ceph: root@host01/]# ceph osd set-backfillfull-ratio 0.92
Additional Resources
- Nearfull OSDS in the Red Hat Ceph Storage Troubleshooting Guide.
- See Deleting data from a full storage cluster for details.
5.2.6. Nearfull OSDs
The ceph health detail
command returns an error message similar to the following one:
HEALTH_WARN 1 nearfull osds osd.2 is near full at 85%
What This Means
Ceph returns the nearfull osds
message when the cluster reaches the capacity set by the mon osd nearfull ratio defaults
parameter. By default, this parameter is set to 0.85
which means 85% of the cluster capacity.
Ceph distributes data based on the CRUSH hierarchy in the best possible way but it cannot guarantee equal distribution. The main causes of the uneven data distribution and the nearfull osds
messages are:
- The OSDs are not balanced among the OSD nodes in the cluster. That is, some OSD nodes host significantly more OSDs than others, or the weight of some OSDs in the CRUSH map is not adequate to their capacity.
- The Placement Group (PG) count is not proper as per the number of the OSDs, use case, target PGs per OSD, and OSD utilization.
- The cluster uses inappropriate CRUSH tunables.
- The back-end storage for OSDs is almost full.
To Troubleshoot This Problem:
- Verify that the PG count is sufficient and increase it if needed.
- Verify that you use CRUSH tunables optimal to the cluster version and adjust them if not.
- Change the weight of OSDs by utilization.
Enable the Ceph Manager balancer module which optimizes the placement of placement groups (PGs) across OSDs in order to achieve a balanced distribution
Example
[root@mon ~]# ceph mgr module enable balancer
Determine how much space is left on the disks used by OSDs.
To view how much space OSDs use in general:
[root@mon ~]# ceph osd df
To view how much space OSDs use on particular nodes. Use the following command from the node containing
nearful
OSDs:$ df
- If needed, add a new OSD node.
Additional Resources
- Full OSDs
- See the Using the Ceph Manager balancer module section in the Red Hat Ceph Storage Operations Guide.
- See the Set an OSD’s Weight by Utilization section in the Storage Strategies guide for Red Hat Ceph Storage 4.
- For details, see the CRUSH Tunables section in the Storage Strategies guide for Red Hat Ceph Storage 4 and the How can I test the impact CRUSH map tunable modifications will have on my PG distribution across OSDs in Red Hat Ceph Storage? solution on the Red Hat Customer Portal.
- See Increasing the placement group for details.
5.2.7. Down OSDs
The ceph health
command returns an error similar to the following one:
HEALTH_WARN 1/3 in osds are down
What This Means
One of the ceph-osd
processes is unavailable due to a possible service failure or problems with communication with other OSDs. As a consequence, the surviving ceph-osd
daemons reported this failure to the Monitors.
If the ceph-osd
daemon is not running, the underlying OSD drive or file system is either corrupted, or some other error, such as a missing keyring, is preventing the daemon from starting.
In most cases, networking issues cause the situation when the ceph-osd
daemon is running but still marked as down
.
To Troubleshoot This Problem
Determine which OSD is
down
:[root@mon ~]# ceph health detail HEALTH_WARN 1/3 in osds are down osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080
Try to restart the
ceph-osd
daemon:[root@mon ~]# systemctl restart ceph-osd@OSD_NUMBER
Replace
OSD_NUMBER
with the ID of the OSD that isdown
, for example:[root@mon ~]# systemctl restart ceph-osd@0
-
If you are not able start
ceph-osd
, follow the steps in Theceph-osd
daemon cannot start. -
If you are able to start the
ceph-osd
daemon but it is marked asdown
, follow the steps in Theceph-osd
daemon is running but still marked as `down`.
-
If you are not able start
The ceph-osd
daemon cannot start
- If you have a node containing a number of OSDs (generally, more than twelve), verify that the default maximum number of threads (PID count) is sufficient. See Increasing the PID count for details.
-
Verify that the OSD data and journal partitions are mounted properly. You can use the
ceph-volume lvm list
command to list all devices and volumes associated with the Ceph Storage Cluster and then manually inspect if they are mounted properly. See themount(8)
manual page for details. -
If you got the
ERROR: missing keyring, cannot use cephx for authentication
error message, the OSD is a missing keyring. If you got the
ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1
error message, theceph-osd
daemon cannot read the underlying file system. See the following steps for instructions on how to troubleshoot and fix this error.NoteIf this error message is returned during boot time of the OSD host, open a support ticket as this might indicate a known issue tracked in the Red Hat Bugzilla 1439210.
Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the
/var/log/ceph/
directory for bare-metal deployments.NoteFor container-based deployment, Ceph generates logs to
journald
. You can enable logging to files in/var/log/ceph
by settinglog_to_file
parameter totrue
under [global] in the Ceph configuration file. See Understanding ceph logs for more details.An
EIO
error message similar to the following one indicates a failure of the underlying disk:To fix this problem replace the underlying OSD disk. See Replacing an OSD drive for details.
If the log includes any other
FAILED assert
errors, such as the following one, open a support ticket. See Contacting Red Hat Support for service for details.FAILED assert(0 == "hit suicide timeout")
Check the
dmesg
output for the errors with the underlying file system or disk:$ dmesg
-
If the
dmesg
output includes anySCSI error
error messages, see the SCSI Error Codes Solution Finder solution on the Red Hat Customer Portal to determine the best way to fix the problem. - Alternatively, if you are unable to fix the underlying file system, replace the OSD drive. See Replacing an OSD drive for details.
-
If the
If the OSD failed with a segmentation fault, such as the following one, gather the required information and open a support ticket. See Contacting Red Hat Support for service for details.
Caught signal (Segmentation fault)
The ceph-osd
is running but still marked as down
Check the corresponding log file to determine the cause of the failure. By default, Ceph stores log files in the
/var/log/ceph/
directory in bare-metal deployments.NoteFor container-based deployment, Ceph generates logs to
journald
. You can enable logging to files in/var/log/ceph
by settinglog_to_file
parameter totrue
under [global] in the Ceph configuration file. See Understanding ceph logs for more details.If the log includes error messages similar to the following ones, see Flapping OSDs.
wrongly marked me down heartbeat_check: no reply from osd.2 since back
- If you see any other errors, open a support ticket. See Contacting Red Hat Support for service for details.
Additional Resources
- Flapping OSDs
- Stale placement groups
- See the Starting, stopping, restarting the Ceph daemon by instances section in the Red Hat Ceph Storage Administration Guide.
- See the Managing Ceph keyrings section in the Red Hat Ceph Storage Administration Guide.
5.2.8. Flapping OSDs
The ceph -w | grep osds
command shows OSDs repeatedly as down
and then up
again within a short period of time:
# ceph -w | grep osds 2021-04-05 06:27:20.810535 mon.0 [INF] osdmap e609: 9 osds: 8 up, 9 in 2021-04-05 06:27:24.120611 mon.0 [INF] osdmap e611: 9 osds: 7 up, 9 in 2021-04-05 06:27:25.975622 mon.0 [INF] HEALTH_WARN; 118 pgs stale; 2/9 in osds are down 2021-04-05 06:27:27.489790 mon.0 [INF] osdmap e614: 9 osds: 6 up, 9 in 2021-04-05 06:27:36.540000 mon.0 [INF] osdmap e616: 9 osds: 7 up, 9 in 2021-04-05 06:27:39.681913 mon.0 [INF] osdmap e618: 9 osds: 8 up, 9 in 2021-04-05 06:27:43.269401 mon.0 [INF] osdmap e620: 9 osds: 9 up, 9 in 2021-04-05 06:27:54.884426 mon.0 [INF] osdmap e622: 9 osds: 8 up, 9 in 2021-04-05 06:27:57.398706 mon.0 [INF] osdmap e624: 9 osds: 7 up, 9 in 2021-04-05 06:27:59.669841 mon.0 [INF] osdmap e625: 9 osds: 6 up, 9 in 2021-04-05 06:28:07.043677 mon.0 [INF] osdmap e628: 9 osds: 7 up, 9 in 2021-04-05 06:28:10.512331 mon.0 [INF] osdmap e630: 9 osds: 8 up, 9 in 2021-04-05 06:28:12.670923 mon.0 [INF] osdmap e631: 9 osds: 9 up, 9 in
In addition the Ceph log contains error messages similar to the following ones:
2021-07-25 03:44:06.510583 osd.50 127.0.0.1:6801/149046 18992 : cluster [WRN] map e600547 wrongly marked me down
2021-07-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2021-07-25 19:00:07.444113 front 2021-07-25 18:59:48.311935 (cutoff 2021-07-25 18:59:48.906862)
What This Means
The main causes of flapping OSDs are:
- Certain storage cluster operations, such as scrubbing or recovery, take an abnormal amount of time, for example if you perform these operations on objects with a large index or large placement groups. Usually, after these operations finish, the flapping OSDs problem is solved.
-
Problems with the underlying physical hardware. In this case, the
ceph health detail
command also returns theslow requests
error message. - Problems with network.
Ceph OSDs cannot manage situations where the private network for the storage cluster fails, or significant latency is on the public client-facing network.
Ceph OSDs use the private network for sending heartbeat packets to each other to indicate that they are up
and in
. If the private storage cluster network does not work properly, OSDs are unable to send and receive the heartbeat packets. As a consequence, they report each other as being down
to the Ceph Monitors, while marking themselves as up
.
The following parameters in the Ceph configuration file influence this behavior:
Parameter | Description | Default value |
---|---|---|
|
How long OSDs wait for the heartbeat packets to return before reporting an OSD as | 20 seconds |
|
How many OSDs must report another OSD as | 2 |
This table shows that in default configuration, the Ceph Monitors mark an OSD as down
if only one OSD made three distinct reports about the first OSD being down
. In some cases, if one single host encounters network issues, the entire cluster can experience flapping OSDs. This is because the OSDs that reside on the host will report other OSDs in the cluster as down
.
The flapping OSDs scenario does not include the situation when the OSD processes are started and then immediately killed.
To Troubleshoot This Problem
Check the output of the
ceph health detail
command again. If it includes theslow requests
error message, see for details on how to troubleshoot this issue.# ceph health detail HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests 30 ops are blocked > 268435 sec 1 ops are blocked > 268435 sec on osd.11 1 ops are blocked > 268435 sec on osd.18 28 ops are blocked > 268435 sec on osd.39 3 osds have slow requests
Determine which OSDs are marked as
down
and on what nodes they reside:# ceph osd tree | grep down
- On the nodes containing the flapping OSDs, troubleshoot and fix any networking problems. For details, see Troubleshooting networking issues.
Alternatively, you can temporary force Monitors to stop marking the OSDs as
down
andup
by setting thenoup
andnodown
flags:# ceph osd set noup # ceph osd set nodown
ImportantUsing the
noup
andnodown
flags does not fix the root cause of the problem but only prevents OSDs from flapping. To open a support ticket, see the Contacting Red Hat Support for service section for details.
Flapping OSDs can be caused by MTU misconfiguration on Ceph OSD nodes, at the network switch level, or both. To resolve the issue, set MTU to a uniform size on all storage cluster nodes, including on the core and access network switches with a planned downtime. Do not tune osd heartbeat min size
because changing this setting can hide issues within the network, and it will not solve actual network inconsistency.
Additional Resources
- See the Verifying the Network Configuration for Red Hat Ceph Storage section in the Red Hat Ceph Storage Installation Guide for details.
- See the Ceph heartbeat section in the Red Hat Ceph Storage Architecture Guide for details.
- See the Slow requests or requests are blocked section in the Red Hat Ceph Storage Troubleshooting Guide.
- See Red Hat’s Knowledgebase solution How to reduce scrub impact in a Red Hat Ceph Storage cluster? for tuning scrubbing process.
5.2.9. Slow requests or requests are blocked
The ceph-osd
daemon is slow to respond to a request and the ceph health detail
command returns an error message similar to the following one:
HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests 30 ops are blocked > 268435 sec 1 ops are blocked > 268435 sec on osd.11 1 ops are blocked > 268435 sec on osd.18 28 ops are blocked > 268435 sec on osd.39 3 osds have slow requests
In addition, the Ceph logs include an error message similar to the following ones:
2015-08-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included below; oldest blocked for > 61.758455 secs
2016-07-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
What This Means
An OSD with slow requests is every OSD that is not able to service the I/O operations per second (IOPS) in the queue within the time defined by the osd_op_complaint_time
parameter. By default, this parameter is set to 30 seconds.
The main causes of OSDs having slow requests are:
- Problems with the underlying hardware, such as disk drives, hosts, racks, or network switches
- Problems with network. These problems are usually connected with flapping OSDs. See Flapping OSDs for details.
- System load
The following table shows the types of slow requests. Use the dump_historic_ops
administration socket command to determine the type of a slow request. For details about the administration socket, see the Using the Ceph Administration Socket section in the Administration Guide for Red Hat Ceph Storage 4.
Slow request type | Description |
---|---|
| The OSD is waiting to acquire a lock on a placement group for the operation. |
| The OSD is waiting for replica OSDs to apply the operation to the journal. |
| The OSD did not reach any major operation milestone. |
| The OSDs have not replicated an object the specified number of times yet. |
To Troubleshoot This Problem
- Determine if the OSDs with slow or block requests share a common piece of hardware, for example a disk drive, host, rack, or network switch.
If the OSDs share a disk:
Use the
smartmontools
utility to check the health of the disk or the logs to determine any errors on the disk.NoteThe
smartmontools
utility is included in thesmartmontools
package.Use the
iostat
utility to get the I/O wait report (%iowai
) on the OSD disk to determine if the disk is under heavy load.NoteThe
iostat
utility is included in thesysstat
package.
If the OSDs share the node with another service:
- Check the RAM and CPU utilization
-
Use the
netstat
utility to see the network statistics on the Network Interface Controllers (NICs) and troubleshoot any networking issues. See also Troubleshooting networking issues for further information.
- If the OSDs share a rack, check the network switch for the rack. For example, if you use jumbo frames, verify that the NIC in the path has jumbo frames set.
- If you are unable to determine a common piece of hardware shared by OSDs with slow requests, or to troubleshoot and fix hardware and networking problems, open a support ticket. See Contacting Red Hat support for service for details.
Additional Resources
- See the Using the Ceph Administration Socket section in the Red Hat Ceph Storage Administration Guide for details.
5.3. Stopping and starting rebalancing
When an OSD fails or you stop it, the CRUSH algorithm automatically starts the rebalancing process to redistribute data across the remaining OSDs.
Rebalancing can take time and resources, therefore, consider stopping rebalancing during troubleshooting or maintaining OSDs.
Placement groups within the stopped OSDs become degraded
during troubleshooting and maintenance.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
To do so, set the
noout
flag before stopping the OSD:[root@mon ~]# ceph osd set noout
When you finish troubleshooting or maintenance, unset the
noout
flag to start rebalancing:[root@mon ~]# ceph osd unset noout
Additional Resources
- The Rebalancing and Recovery section in the Red Hat Ceph Storage Architecture Guide.
5.4. Mounting the OSD data partition
If the OSD data partition is not mounted correctly, the ceph-osd
daemon cannot start. If you discover that the partition is not mounted as expected, follow the steps in this section to mount it.
This section is specific to bare-metal deployments only.
Prerequisites
-
Access to the
ceph-osd
daemon. - Root-level access to the Ceph Monitor node.
Procedure
Mount the partition:
[root@ceph-mon]# mount -o noatime PARTITION /var/lib/ceph/osd/CLUSTER_NAME-OSD_NUMBER
Replace
PARTITION
with the path to the partition on the OSD drive dedicated to OSD data. Specify the cluster name and the OSD number.Example
[root@ceph-mon]# mount -o noatime /dev/sdd1 /var/lib/ceph/osd/ceph-0
Try to start the failed
ceph-osd
daemon:[root@ceph-mon]# systemctl start ceph-osd@OSD_NUMBER
Replace the
OSD_NUMBER
with the ID of the OSD.Example
[root@ceph-mon]# systemctl start ceph-osd@0
Additional Resources
- See the Down OSDs in the Red Hat Ceph Storage Troubleshooting Guide for more details.
5.5. Replacing an OSD drive
Ceph is designed for fault tolerance, which means that it can operate in a degraded
state without losing data. Consequently, Ceph can operate even if a data storage drive fails. In the context of a failed drive, the degraded
state means that the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the cluster. However, if this occurs, replace the failed OSD drive and recreate the OSD manually.
When a drive fails, Ceph reports the OSD as down
:
HEALTH_WARN 1/3 in osds are down osd.0 is down since epoch 23, last address 192.168.106.220:6800/11080
Ceph can mark an OSD as down
also as a consequence of networking or permissions problems. See Down OSDs for details.
Modern servers typically deploy with hot-swappable drives so you can pull a failed drive and replace it with a new one without bringing down the node. The whole procedure includes these steps:
- Remove the OSD from the Ceph cluster. For details, see the Removing an OSD from the Ceph Cluster procedure.
- Replace the drive. For details, see Replacing the physical drive] section.
- Add the OSD to the cluster. For details, see Adding an OSD to the Ceph Cluster] procedure.
Prerequisites
- Root-level access to the Ceph Monitor node.
Determine which OSD is
down
:[root@mon ~]# ceph osd tree | grep -i down ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 0 0.00999 osd.0 down 1.00000 1.00000
Ensure that the OSD process is stopped. Use the following command from the OSD node:
[root@mon ~]# systemctl status ceph-osd@_OSD_NUMBER_
Replace
OSD_NUMBER
with the ID of the OSD marked asdown
, for example:[root@mon ~]# systemctl status ceph-osd@osd.0 ... Active: inactive (dead)
If the
ceph-osd
daemon is running. See Down OSDs for more details about troubleshooting OSDs that are marked asdown
but their correspondingceph-osd
daemon is running.
Procedure: Removing an OSD from the Ceph Cluster
Mark the OSD as
out
:[root@mon ~]# ceph osd out osd.OSD_NUMBER
Replace
OSD_NUMBER
with the ID of the OSD that is marked asdown
, for example:[root@mon ~]# ceph osd out osd.0 marked out osd.0.
NoteIf the OSD is
down
, Ceph marks it asout
automatically after 600 seconds when it does not receive any heartbeat packet from the OSD. When this happens, other OSDs with copies of the failed OSD data begin backfilling to ensure that the required number of copies exists within the cluster. While the cluster is backfilling, the cluster will be in adegraded
state.Ensure that the failed OSD is backfilling. The output will include information similar to the following one:
[root@mon ~]# ceph -w | grep backfill 2017-06-02 04:48:03.403872 mon.0 [INF] pgmap v10293282: 431 pgs: 1 active+undersized+degraded+remapped+backfilling, 28 active+undersized+degraded, 49 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 294 active+clean; 72347 MB data, 101302 MB used, 1624 GB / 1722 GB avail; 227 kB/s rd, 1358 B/s wr, 12 op/s; 10626/35917 objects degraded (29.585%); 6757/35917 objects misplaced (18.813%); 63500 kB/s, 15 objects/s recovering 2017-06-02 04:48:04.414397 mon.0 [INF] pgmap v10293283: 431 pgs: 2 active+undersized+degraded+remapped+backfilling, 75 active+undersized+degraded+remapped+wait_backfill, 59 stale+active+clean, 295 active+clean; 72347 MB data, 101398 MB used, 1623 GB / 1722 GB avail; 969 kB/s rd, 6778 B/s wr, 32 op/s; 10626/35917 objects degraded (29.585%); 10580/35917 objects misplaced (29.457%); 125 MB/s, 31 objects/s recovering 2017-06-02 04:48:00.380063 osd.1 [INF] 0.6f starting backfill to osd.0 from (0'0,0'0] MAX to 2521'166639 2017-06-02 04:48:00.380139 osd.1 [INF] 0.48 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'43079 2017-06-02 04:48:00.380260 osd.1 [INF] 0.d starting backfill to osd.0 from (0'0,0'0] MAX to 2513'136847 2017-06-02 04:48:00.380849 osd.1 [INF] 0.71 starting backfill to osd.0 from (0'0,0'0] MAX to 2331'28496 2017-06-02 04:48:00.381027 osd.1 [INF] 0.51 starting backfill to osd.0 from (0'0,0'0] MAX to 2513'87544
Remove the OSD from the CRUSH map:
[root@mon ~]# ceph osd crush remove osd.OSD_NUMBER
Replace
OSD_NUMBER
with the ID of the OSD that is marked asdown
, for example:[root@mon ~]# ceph osd crush remove osd.0 removed item id 0 name 'osd.0' from crush map
Remove authentication keys related to the OSD:
[root@mon ~]# ceph auth del osd.OSD_NUMBER
Replace
OSD_NUMBER
with the ID of the OSD that is marked asdown
, for example:[root@mon ~]# ceph auth del osd.0 updated
Remove the OSD from the Ceph Storage Cluster:
[root@mon ~]# ceph osd rm osd.OSD_NUMBER
Replace
OSD_NUMBER
with the ID of the OSD that is marked asdown
, for example:[root@mon ~]# ceph osd rm osd.0 removed osd.0
If you have removed the OSD successfully, it is not present in the output of the following command:
[root@mon ~]# ceph osd tree
For bare-metal deployments, unmount the failed drive:
[root@mon ~]# umount /var/lib/ceph/osd/CLUSTER_NAME-OSD_NUMBER
Specify the name of the cluster and the ID of the OSD, for example:
[root@mon ~]# umount /var/lib/ceph/osd/ceph-0/
If you have unmounted the drive successfully, it is not present in the output of the following command:
[root@mon ~]# df -h
Procedure: Replacing the physical drive
See the documentation for the hardware node for details on replacing the physical drive.
- If the drive is hot-swappable, replace the failed drive with a new one.
- If the drive is not hot-swappable and the node contains multiple OSDs, you might have to shut down the whole node and replace the physical drive. Consider preventing the cluster from backfilling. See the Stopping and Starting Rebalancing chapter in the Red Hat Ceph Storage Troubleshooting Guide for details.
-
When the drive appears under the
/dev/
directory, make a note of the drive path. - If you want to add the OSD manually, find the OSD drive and format the disk.
Procedure: Adding an OSD to the Ceph Cluster
Add the OSD again.
If you used Ansible to deploy the cluster, run the
ceph-ansible
playbook again from the Ceph administration server:Bare-metal deployments:
Syntax
ansible-playbook site.yml -i hosts --limit NEW_OSD_NODE_NAME
Example
[user@admin ceph-ansible]$ ansible-playbook site.yml -i hosts --limit node03
Container deployments:
Syntax
ansible-playbook site-container.yml -i hosts --limit NEW_OSD_NODE_NAME
Example
[user@admin ceph-ansible]$ ansible-playbook site-container.yml -i hosts --limit node03
- If you added the OSD manually, see the Adding a Ceph OSD with the Command-line Interface section in the Red Hat Ceph Storage 4 Operations Guide.
Ensure that the CRUSH hierarchy is accurate:
[root@mon ~]# ceph osd tree
If you are not satisfied with the location of the OSD in the CRUSH hierarchy, move the OSD to a desired location:
[root@mon ~]# ceph osd crush move BUCKET_TO_MOVE BUCKET_TYPE=PARENT_BUCKET
For example, to move the bucket located at
sdd:row1
to the root bucket:[root@mon ~]# ceph osd crush move ssd:row1 root=ssd:root
Additional Resources
- See the Down OSDs section in the Red Hat Ceph Storage Troubleshooting Guide.
- See the Managing the storage cluster size chapter in the Red Hat Ceph Storage Operations Guide.
- See the Red Hat Ceph Storage Installation Guide.
5.6. Increasing the PID count
If you have a node containing more than 12 Ceph OSDs, the default maximum number of threads (PID count) can be insufficient, especially during recovery. As a consequence, some ceph-osd
daemons can terminate and fail to start again. If this happens, increase the maximum possible number of threads allowed.
Procedure
To temporary increase the number:
[root@mon ~]# sysctl -w kernel.pid.max=4194303
To permanently increase the number, update the /etc/sysctl.conf
file as follows:
kernel.pid.max = 4194303
5.7. Deleting data from a full storage cluster
Ceph automatically prevents any I/O operations on OSDs that reached the capacity specified by the mon_osd_full_ratio
parameter and returns the full osds
error message.
This procedure shows how to delete unnecessary data to fix this error.
The mon_osd_full_ratio
parameter sets the value of the full_ratio
parameter when creating a cluster. You cannot change the value of mon_osd_full_ratio
afterwards. To temporarily increase the full_ratio
value, increase the set-full-ratio
instead.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
Determine the current value of
full_ratio
, by default it is set to0.95
:[root@mon ~]# ceph osd dump | grep -i full full_ratio 0.95
Temporarily increase the value of
set-full-ratio
to0.97
:[root@mon ~]# ceph osd set-full-ratio 0.97
ImportantRed Hat strongly recommends to not set the
set-full-ratio
to a value higher than 0.97. Setting this parameter to a higher value makes the recovery process harder. As a consequence, you might not be able to recover full OSDs at all.Verify that you successfully set the parameter to
0.97
:[root@mon ~]# ceph osd dump | grep -i full full_ratio 0.97
Monitor the cluster state:
[root@mon ~]# ceph -w
As soon as the cluster changes its state from
full
tonearfull
, delete any unnecessary data.Set the value of
full_ratio
back to0.95
:[root@mon ~]# ceph osd set-full-ratio 0.95
Verify that you successfully set the parameter to
0.95
:[root@mon ~]# ceph osd dump | grep -i full full_ratio 0.95
Additional Resources
- Full OSDs section in the Red Hat Ceph Storage Troubleshooting Guide.
- Nearfull OSDs section in the Red Hat Ceph Storage Troubleshooting Guide.
5.8. Redeploying OSDs after upgrading the storage cluster
This section describes how to redeploy OSDS after upgrading from Red Hat Ceph Storage 3 to Red Hat Ceph Storage 4 with non-collocated daemons for OSDs with block.db
on dedicated devices, without upgrading the operating system.
This procedure applies to both bare-metal and container deployments, unless specified.
After the upgrade, the playbook for redeploying OSDs can fail with the an error message:
GPT headers found, they must be removed on: /dev/vdb
You can redeploy the OSDs by creating a partition in the block.db
device and running the Ansible playbook.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ansible Administration node.
- Ansible user account created.
Procedure
Create the partition on the
block.db
device. Thissgdisk
command uses the next available partition number automatically:Syntax
sgdisk --new=0:0:_JOURNAL_SIZE_ -- NEW_DEVICE_PATH
Example
[root@admin ~]# sgdisk --new=0:0:+2G -- /dev/vdb
Create the
host_vars
directory:[root@admin ~]# mkdir /usr/share/ceph-ansible/host_vars
Navigate to the
host_vars
directory:[root@admin ~]# cd /usr/share/ceph-ansible/host_vars
Create the hosts file on all the hosts of the storage cluster:
Syntax
touch NEW_OSD_HOST_NAME
Example
[root@admin host_vars]# touch osd5
In the hosts file, define the data device:
Syntax
lvm_volumes: - data: DATA_DEVICE_PATH journal: NEW_DEVICE_PARTITION_PATH - data: RUNNING_OSD_DATA_DEVICE_PATH journal: PARTITION_PATH - data: RUNNING_OSD_DATA_DEVICE_PATH journal: PARTITION_PATH
Example
lvm_volumes: - data: /dev/vdd journal: /dev/vdb2 - data: /dev/sdb journal: /dev/sdc1 - data: /dev/sdb journal: /dev/sdc2
Switch to the Ansible user and verify that Ansible can reach all the Ceph nodes:
[admin@admin ~]$ ansible all -m ping
Change directory to the Ansible configuration directory:
[admin@admin ~]$ cd /usr/share/ceph-ansible
Run the following Ansible playbook with
--limit
option:Bare-metal deployments:
[admin@admin ceph-ansible]$ ansible-playbook site.yml --limit osds -i hosts
Container deployments:
[admin@admin ceph-ansible]$ ansible-playbook site-container.yml --limit osds -i hosts
Additional Resources
- See the Handling a disk failure section in the Red Hat Ceph Storage Operations Guide for more details on deploying OSDs.