Chapter 6. Management of OSDs using the Ceph Orchestrator
As a storage administrator, you can use the Ceph Orchestrators to manage OSDs of a Red Hat Ceph Storage cluster.
6.1. Ceph OSDs Copy linkLink copied to clipboard!
When a Red Hat Ceph Storage cluster is up and running, you can add OSDs to the storage cluster at runtime.
A Ceph OSD generally consists of one ceph-osd
daemon for one storage drive and its associated journal within a node. If a node has multiple storage drives, then map one ceph-osd
daemon for each drive.
Red Hat recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a storage cluster reaches its near full
ratio, add one or more OSDs to expand the storage cluster’s capacity.
When you want to reduce the size of a Red Hat Ceph Storage cluster or replace the hardware, you can also remove an OSD at runtime. If the node has multiple storage drives, you might also need to remove one of the ceph-osd
daemon for that drive. Generally, it’s a good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you remove an OSD that the storage cluster is not at its near full
ratio.
Do not let a storage cluster reach the full
ratio before adding an OSD. OSD failures that occur after the storage cluster reaches the near full
ratio can cause the storage cluster to exceed the full
ratio. Ceph blocks write access to protect the data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full
ratio first.
6.2. Ceph OSD node configuration Copy linkLink copied to clipboard!
Configure Ceph OSDs and their supporting hardware similarly as a storage strategy for the pool(s) that will use the OSDs. Ceph prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with drives of the same type or size.
If you add drives of dissimilar size, adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than older nodes in the storage cluster, that is, they might have a greater weight.
Before doing a new installation, review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide.
6.3. Automatically tuning OSD memory Copy linkLink copied to clipboard!
The OSD daemons adjust the memory consumption based on the osd_memory_target
configuration option. The option osd_memory_target
sets OSD memory based upon the available RAM in the system.
If Red Hat Ceph Storage is deployed on dedicated nodes that do not share memory with other services, cephadm
automatically adjusts the per-OSD consumption based on the total amount of RAM and the number of deployed OSDs.
By default, the osd_memory_target_autotune
parameter is set to true
in the Red Hat Ceph Storage cluster.
Syntax
ceph config set osd osd_memory_target_autotune true
ceph config set osd osd_memory_target_autotune true
Cephadm starts with a fraction mgr/cephadm/autotune_memory_target_ratio
, which defaults to 0.7
of the total RAM in the system, subtract off any memory consumed by non-autotuned daemons such as non-OSDS and for OSDs for which osd_memory_target_autotune
is false, and then divide by the remaining OSDs.
The osd_memory_target
parameter is calculated as follows:
Syntax
osd_memory_target = TOTAL_RAM_OF_THE_OSD * (1048576) * (autotune_memory_target_ratio) / NUMBER_OF_OSDS_IN_THE_OSD_NODE - (SPACE_ALLOCATED_FOR_OTHER_DAEMONS)
osd_memory_target = TOTAL_RAM_OF_THE_OSD * (1048576) * (autotune_memory_target_ratio) / NUMBER_OF_OSDS_IN_THE_OSD_NODE - (SPACE_ALLOCATED_FOR_OTHER_DAEMONS)
SPACE_ALLOCATED_FOR_OTHER_DAEMONS may optionally include the following daemon space allocations:
- Alertmanager: 1 GB
- Grafana: 1 GB
- Ceph Manager: 4 GB
- Ceph Monitor: 2 GB
- Node-exporter: 1 GB
- Prometheus: 1 GB
For example, if a node has 24 OSDs and has 251 GB RAM space, then osd_memory_target
is 7860684936
.
The final targets are reflected in the configuration database with options. You can view the limits and the current memory consumed by each daemon from the ceph orch ps
output under MEM LIMIT
column.
The default setting of osd_memory_target_autotune
true
is unsuitable for hyperconverged infrastructures where compute and Ceph storage services are colocated. In a hyperconverged infrastructure, the autotune_memory_target_ratio
can be set to 0.2
to reduce the memory consumption of Ceph.
Example
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2
You can manually set a specific memory target for an OSD in the storage cluster.
Example
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 7860684936
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 7860684936
You can manually set a specific memory target for an OSD host in the storage cluster.
Syntax
ceph config set osd/host:HOSTNAME osd_memory_target TARGET_BYTES
ceph config set osd/host:HOSTNAME osd_memory_target TARGET_BYTES
Example
[ceph: root@host01 /]# ceph config set osd/host:host01 osd_memory_target 1000000000
[ceph: root@host01 /]# ceph config set osd/host:host01 osd_memory_target 1000000000
Enabling osd_memory_target_autotune
overwrites existing manual OSD memory target settings. To prevent daemon memory from being tuned even when the osd_memory_target_autotune
option or other similar options are enabled, set the _no_autotune_memory
label on the host.
Syntax
ceph orch host label add HOSTNAME _no_autotune_memory
ceph orch host label add HOSTNAME _no_autotune_memory
You can exclude an OSD from memory autotuning by disabling the autotune option and setting a specific memory target.
Example
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target_autotune false [ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 16G
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target_autotune false
[ceph: root@host01 /]# ceph config set osd.123 osd_memory_target 16G
6.4. Listing devices for Ceph OSD deployment Copy linkLink copied to clipboard!
You can check the list of available devices before deploying OSDs using the Ceph Orchestrator. The commands are used to print a list of devices discoverable by Cephadm. A storage device is considered available if all of the following conditions are met:
- The device must have no partitions.
- The device must not have any LVM state.
- The device must not be mounted.
- The device must not contain a file system.
- The device must not contain a Ceph BlueStore OSD.
- The device must be larger than 5 GB.
Ceph will not provision an OSD on a device that is not available.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- All manager and monitor daemons are deployed.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the available devices to deploy OSDs:
Syntax
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Using the
--wide
option provides all details relating to the device, including any reasons that the device might not be eligible for use as an OSD. This option does not support NVMe devices.Optional: To enable Health, Ident, and Fault fields in the output of
ceph orch device ls
, run the following commands:NoteThese fields are supported by
libstoragemgmt
library and currently supports SCSI, SAS, and SATA devices.As root user outside the Cephadm shell, check your hardware’s compatibility with
libstoragemgmt
library to avoid unplanned interruption to services:Example
cephadm shell lsmcli ldl
[root@host01 ~]# cephadm shell lsmcli ldl
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the output, you see the Health Status as Good with the respective SCSI VPD 0x83 ID.
NoteIf you do not get this information, then enabling the fields might cause erratic behavior of devices.
Log back into the Cephadm shell and enable
libstoragemgmt
support:Example
cephadm shell
[root@host01 ~]# cephadm shell [ceph: root@host01 /]# ceph config set mgr mgr/cephadm/device_enhanced_scan true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once this is enabled,
ceph orch device ls
gives the output of Health field as Good.
Verification
List the devices:
Example
[ceph: root@host01 /]# ceph orch device ls
[ceph: root@host01 /]# ceph orch device ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.5. Zapping devices for Ceph OSD deployment Copy linkLink copied to clipboard!
You need to check the list of available devices before deploying OSDs. If there is no space available on the devices, you can clear the data on the devices by zapping them.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- All manager and monitor daemons are deployed.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the available devices to deploy OSDs:
Syntax
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Clear the data of a device:
Syntax
ceph orch device zap HOSTNAME FILE_PATH --force
ceph orch device zap HOSTNAME FILE_PATH --force
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device zap host02 /dev/sdb --force
[ceph: root@host01 /]# ceph orch device zap host02 /dev/sdb --force
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the space is available on the device:
Example
[ceph: root@host01 /]# ceph orch device ls
[ceph: root@host01 /]# ceph orch device ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You will see that the field under Available is Yes.
6.6. Deploying Ceph OSDs on all available devices Copy linkLink copied to clipboard!
You can deploy all OSDS on all the available devices. Cephadm allows the Ceph Orchestrator to discover and deploy the OSDs on any available and unused storage device.
To deploy OSDs all available devices, run the command without the unmanaged
parameter and then re-run the command with the parameter to prevent from creating future OSDs.
The deployment of OSDs with --all-available-devices
is generally used for smaller clusters. For larger clusters, use the OSD specification file.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- All manager and monitor daemons are deployed.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the available devices to deploy OSDs:
Syntax
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy OSDs on all available devices:
Example
[ceph: root@host01 /]# ceph orch apply osd --all-available-devices
[ceph: root@host01 /]# ceph orch apply osd --all-available-devices
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The effect of
ceph orch apply
is persistent which means that the Orchestrator automatically finds the device, adds it to the cluster, and creates new OSDs. This occurs under the following conditions:- New disks or drives are added to the system.
- Existing disks or drives are zapped.
An OSD is removed and the devices are zapped.
You can disable automatic creation of OSDs on all the available devices by using the
--unmanaged
parameter.Example
[ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true
[ceph: root@host01 /]# ceph orch apply osd --all-available-devices --unmanaged=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Setting the parameter
--unmanaged
totrue
disables the creation of OSDs and also there is no change if you apply a new OSD service.NoteThe command
ceph orch daemon add
creates new OSDs, but does not add an OSD service.
Verification
List the service:
Example
[ceph: root@host01 /]# ceph orch ls
[ceph: root@host01 /]# ceph orch ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the details of the node and devices:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.7. Deploying Ceph OSDs on specific devices and hosts Copy linkLink copied to clipboard!
You can deploy all the Ceph OSDs on specific devices and hosts using the Ceph Orchestrator.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- All manager and monitor daemons are deployed.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the available devices to deploy OSDs:
Syntax
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
ceph orch device ls [--hostname=HOSTNAME_1 HOSTNAME_2] [--wide] [--refresh]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
[ceph: root@host01 /]# ceph orch device ls --wide --refresh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy OSDs on specific devices and hosts:
Syntax
ceph orch daemon add osd HOSTNAME:DEVICE_PATH
ceph orch daemon add osd HOSTNAME:DEVICE_PATH
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb
[ceph: root@host01 /]# ceph orch daemon add osd host02:/dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To deploy ODSs on a raw physical device, without an LVM layer, use the
--method raw
option.Syntax
ceph orch daemon add osd --method raw HOSTNAME:DEVICE_PATH
ceph orch daemon add osd --method raw HOSTNAME:DEVICE_PATH
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon add osd --method raw host02:/dev/sdb
[ceph: root@host01 /]# ceph orch daemon add osd --method raw host02:/dev/sdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you have separate DB or WAL devices, the ratio of block to DB or WAL devices MUST be 1:1.
Verification
List the service:
Example
[ceph: root@host01 /]# ceph orch ls osd
[ceph: root@host01 /]# ceph orch ls osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the details of the node and devices:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the hosts, daemons, and processes:
Syntax
ceph orch ps --service_name=SERVICE_NAME
ceph orch ps --service_name=SERVICE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch ps --service_name=osd
[ceph: root@host01 /]# ceph orch ps --service_name=osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.8. Advanced service specifications and filters for deploying OSDs Copy linkLink copied to clipboard!
Service Specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract way to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names and paths. For each device and each host, define a yaml
file or a json
file.
General settings for OSD specifications
- service_type: 'osd': This is mandatory to create OSDS
- service_id: Use the service name or identification you prefer. A set of OSDs is created using the specification file. This name is used to manage all the OSDs together and represent an Orchestrator service.
placement: This is used to define the hosts on which the OSDs need to be deployed.
You can use on the following options:
- host_pattern: '*' - A host name pattern used to select hosts.
- label: 'osd_host' - A label used in the hosts where OSD need to be deployed.
- hosts: 'host01', 'host02' - An explicit list of host names where OSDs needs to be deployed.
selection of devices: The devices where OSDs are created. This allows us to separate an OSD from different devices. You can create only BlueStore OSDs which have three components:
- OSD data: contains all the OSD data
- WAL: BlueStore internal journal or write-ahead Log
- DB: BlueStore internal metadata
- data_devices: Define the devices to deploy OSD. In this case, OSDs are created in a collocated schema. You can use filters to select devices and folders.
- wal_devices: Define the devices used for WAL OSDs. You can use filters to select devices and folders.
- db_devices: Define the devices for DB OSDs. You can use the filters to select devices and folders.
-
encrypted: An optional parameter to encrypt information on the OSD which can set to either
True
orFalse
- unmanaged: An optional parameter, set to False by default. You can set it to True if you do not want the Orchestrator to manage the OSD service.
- block_wal_size: User-defined value, in bytes.
- block_db_size: User-defined value, in bytes.
- osds_per_device: User-defined value for deploying more than one OSD per device.
-
method: An optional parameter to specify if an OSD is created with an LVM layer or not. Set to
raw
if you want to create OSDs on raw physical devices that do not include an LVM layer. If you have separate DB or WAL devices, the ratio of block to DB or WAL devices MUST be 1:1.
Filters for specifying devices
Filters are used in conjunction with the data_devices
, wal_devices
and db_devices
parameters.
Name of the filter | Description | Syntax | Example |
Model |
Target specific disks. You can get details of the model by running | Model: DISK_MODEL_NAME | Model: MC-55-44-XZ |
Vendor | Target specific disks | Vendor: DISK_VENDOR_NAME | Vendor: Vendor Cs |
Size Specification | Includes disks of an exact size | size: EXACT | size: '10G' |
Size Specification | Includes disks size of which is within the range | size: LOW:HIGH | size: '10G:40G' |
Size Specification | Includes disks less than or equal to in size | size: :HIGH | size: ':10G' |
Size Specification | Includes disks equal to or greater than in size | size: LOW: | size: '40G:' |
Rotational | Rotational attribute of the disk. 1 matches all disks that are rotational and 0 matches all the disks that are non-rotational. If rotational =0, then OSD is configured with SSD or NVME. If rotational=1 then the OSD is configured with HDD. | rotational: 0 or 1 | rotational: 0 |
All | Considers all the available disks | all: true | all: true |
Limiter | When you have specified valid filters but want to limit the amount of matching disks you can use the ‘limit’ directive. It should be used only as a last resort. | limit: NUMBER | limit: 2 |
To create an OSD with non-collocated components in the same host, you have to specify the different types of devices used and the devices should be on the same host.
The devices used for deploying OSDs must be supported by libstoragemgmt
.
6.9. Deploying Ceph OSDs using advanced service specifications Copy linkLink copied to clipboard!
The service specification of type OSD is a way to describe a cluster layout using the properties of disks. It gives the user an abstract way to tell Ceph which disks should turn into an OSD with the required configuration without knowing the specifics of device names and paths.
You can deploy the OSD for each device and each host by defining a yaml
file or a json
file.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- All manager and monitor daemons are deployed.
Procedure
On the monitor node, create the
osd_spec.yaml
file:Example
touch osd_spec.yaml
[root@host01 ~]# touch osd_spec.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
osd_spec.yaml
file to include the following details:Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Simple scenarios: In these cases, all the nodes have the same set-up.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Simple scenario: In this case, all the nodes have the same setup with OSD devices created in raw mode, without an LVM layer.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Advanced scenario: This would create the desired layout by using all HDDs as
data_devices
with two SSD assigned as dedicated DB or WAL devices. The remaining SSDs aredata_devices
that have the NVMEs vendors assigned as dedicated DB or WAL devices.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Advanced scenario with non-uniform nodes: This applies different OSD specs to different hosts depending on the host_pattern key.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Advanced scenario with dedicated WAL and DB devices:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Advanced scenario with multiple OSDs per device:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For pre-created volumes, edit the
osd_spec.yaml
file to include the following details:Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For OSDs by ID, edit the
osd_spec.yaml
file to include the following details:NoteThis configuration is applicable for Red Hat Ceph Storage 5.3z1 and later releases. For earlier releases, use pre-created lvm.
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For OSDs by path, edit the
osd_spec.yaml
file to include the following details:NoteThis configuration is applicable for Red Hat Ceph Storage 5.3z1 and later releases. For earlier releases, use pre-created lvm.
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Mount the YAML file under a directory in the container:
Example
cephadm shell --mount osd_spec.yaml:/var/lib/ceph/osd/osd_spec.yaml
[root@host01 ~]# cephadm shell --mount osd_spec.yaml:/var/lib/ceph/osd/osd_spec.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to the directory:
Example
[ceph: root@host01 /]# cd /var/lib/ceph/osd/
[ceph: root@host01 /]# cd /var/lib/ceph/osd/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Before deploying OSDs, do a dry run:
NoteThis step gives a preview of the deployment, without deploying the daemons.
Example
[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml --dry-run
[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml --dry-run
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy OSDs using service specification:
Syntax
ceph orch apply -i FILE_NAME.yml
ceph orch apply -i FILE_NAME.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml
[ceph: root@host01 osd]# ceph orch apply -i osd_spec.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
List the service:
Example
[ceph: root@host01 /]# ceph orch ls osd
[ceph: root@host01 /]# ceph orch ls osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the details of the node and devices:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.10. Removing the OSD daemons using the Ceph Orchestrator Copy linkLink copied to clipboard!
You can remove the OSD from a cluster by using Cephadm.
Removing an OSD from a cluster involves two steps:
- Evacuates all placement groups (PGs) from the cluster.
- Removes the PG-free OSDs from the cluster.
The --zap
option removed the volume groups, logical volumes, and the LVM metadata.
After removing OSDs, if the drives the OSDs were deployed on once again become available, cephadm`
might automatically try to deploy more OSDs on these drives if they match an existing drivegroup specification. If you deployed the OSDs you are removing with a spec and do not want any new OSDs deployed on the drives after removal, modify the drivegroup specification before removal. While deploying OSDs, if you have used --all-available-devices
option, set unmanaged: true
to stop it from picking up new drives at all. For other deployments, modify the specification. See the Deploying Ceph OSDs using advanced service specifications for more details.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- Ceph Monitor, Ceph Manager and Ceph OSD daemons are deployed on the storage cluster.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the device and the node from which the OSD has to be removed:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD:
Syntax
ceph orch osd rm OSD_ID [--replace] [--force] --zap
ceph orch osd rm OSD_ID [--replace] [--force] --zap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch osd rm 0 --zap
[ceph: root@host01 /]# ceph orch osd rm 0 --zap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you remove the OSD from the storage cluster without an option, such as
--replace
, the device is removed from the storage cluster completely. If you want to use the same device for deploying OSDs, you have to first zap the device before adding it to the storage cluster.Optional: To remove multiple OSDs from a specific node, run the following command:
Syntax
ceph orch osd rm OSD_ID OSD_ID --zap
ceph orch osd rm OSD_ID OSD_ID --zap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap
[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the OSD removal:
Example
[ceph: root@host01 /]# ceph orch osd rm status OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT 9 host01 done, waiting for purge 0 False False True 2023-06-06 17:50:50.525690 10 host03 done, waiting for purge 0 False False True 2023-06-06 17:49:38.731533 11 host02 done, waiting for purge 0 False False True 2023-06-06 17:48:36.641105
[ceph: root@host01 /]# ceph orch osd rm status OSD HOST STATE PGS REPLACE FORCE ZAP DRAIN STARTED AT 9 host01 done, waiting for purge 0 False False True 2023-06-06 17:50:50.525690 10 host03 done, waiting for purge 0 False False True 2023-06-06 17:49:38.731533 11 host02 done, waiting for purge 0 False False True 2023-06-06 17:48:36.641105
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When no PGs are left on the OSD, it is decommissioned and removed from the cluster.
Verification
Verify the details of the devices and the nodes from which the Ceph OSDs are removed:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.11. Replacing the OSDs using the Ceph Orchestrator Copy linkLink copied to clipboard!
When disks fail, you can replace the physical storage device and reuse the same OSD ID to avoid having to reconfigure the CRUSH map.
You can replace the OSDs from the cluster using the --replace
option.
If you want to replace a single OSD, see Deploying Ceph OSDs on specific devices and hosts. If you want to deploy OSDs on all available devices, see Deploying Ceph OSDs on all available devices.
This option preserves the OSD ID using the ceph orch rm
command. The OSD is not permanently removed from the CRUSH hierarchy, but is assigned the destroyed
flag. This flag is used to determine the OSD IDs that can be reused in the next OSD deployment. The destroyed
flag is used to determine which OSD id is reused in the next OSD deployment.
Similar to rm
command, replacing an OSD from a cluster involves two steps:
- Evacuating all placement groups (PGs) from the cluster.
- Removing the PG-free OSD from the cluster.
If you use OSD specification for deployment, your newly added disk is assigned the OSD ID of their replaced counterparts.
After removing OSDs, if the drives the OSDs were deployed on once again become available, cephadm
might automatically try to deploy more OSDs on these drives if they match an existing drivegroup specification. If you deployed the OSDs you are removing with a spec and do not want any new OSDs deployed on the drives after removal, modify the drivegroup specification before removal. While deploying OSDs, if you have used --all-available-devices
option, set unmanaged: true
to stop it from picking up new drives at all. For other deployments, modify the specification. See the Deploying Ceph OSDs using advanced service specifications for more details.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- Monitor, Manager, and OSD daemons are deployed on the storage cluster.
- A new OSD that replaces the removed OSD must be created on the same host from which the OSD was removed.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure to dump and save a mapping of your OSD configurations for future references:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the device and the node from which the OSD has to be replaced:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace the OSD:
ImportantIf the storage cluster has
health_warn
or other errors associated with it, check and try to fix any errors before replacing the OSD to avoid data loss.Syntax
ceph orch osd rm OSD_ID --replace [--force]
ceph orch osd rm OSD_ID --replace [--force]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
--force
option can be used when there are ongoing operations on the storage cluster.Example
[ceph: root@host01 /]# ceph orch osd rm 0 --replace
[ceph: root@host01 /]# ceph orch osd rm 0 --replace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the OSD replacement:
Example
[ceph: root@host01 /]# ceph orch osd rm status
[ceph: root@host01 /]# ceph orch osd rm status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the orchestrator to apply any existing OSD specification:
Example
[ceph: root@node /]# ceph orch pause [ceph: root@node /]# ceph orch status Backend: cephadm Available: Yes Paused: Yes
[ceph: root@node /]# ceph orch pause [ceph: root@node /]# ceph orch status Backend: cephadm Available: Yes Paused: Yes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Zap the OSD devices that have been removed:
Example
[ceph: root@node /]# ceph orch device zap node.example.com /dev/sdi --force zap successful for /dev/sdi on node.example.com [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdf --force zap successful for /dev/sdf on node.example.com
[ceph: root@node /]# ceph orch device zap node.example.com /dev/sdi --force zap successful for /dev/sdi on node.example.com [ceph: root@node /]# ceph orch device zap node.example.com /dev/sdf --force zap successful for /dev/sdf on node.example.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Resume the Orcestrator from pause mode
Example
[ceph: root@node /]# ceph orch resume
[ceph: root@node /]# ceph orch resume
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the OSD replacement:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the details of the devices and the nodes from which the Ceph OSDs are replaced:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can see an OSD with the same id as the one you replaced running on the same host.
Verify that the
db_device
for the new deployed OSDs is the replaceddb_device
:Example
[ceph: root@host01 /]# ceph osd metadata 0 | grep bluefs_db_devices "bluefs_db_devices": "nvme0n1", [ceph: root@host01 /]# ceph osd metadata 1 | grep bluefs_db_devices "bluefs_db_devices": "nvme0n1",
[ceph: root@host01 /]# ceph osd metadata 0 | grep bluefs_db_devices "bluefs_db_devices": "nvme0n1", [ceph: root@host01 /]# ceph osd metadata 1 | grep bluefs_db_devices "bluefs_db_devices": "nvme0n1",
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.12. Replacing the OSDs with pre-created LVM Copy linkLink copied to clipboard!
After purging the OSD with the ceph-volume lvm zap
command, if the directory is not present, then you can replace the OSDs with the OSd service specification file with the pre-created LVM.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Failed OSD
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSD:
Syntax
ceph orch osd rm OSD_ID [--replace]
ceph orch osd rm OSD_ID [--replace]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch osd rm 8 --replace Scheduled OSD(s) for removal
[ceph: root@host01 /]# ceph orch osd rm 8 --replace Scheduled OSD(s) for removal
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the OSD is destroyed:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Zap and remove the OSD using the
ceph-volume
command:Syntax
ceph-volume lvm zap --osd-id OSD_ID
ceph-volume lvm zap --osd-id OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the OSD topology:
Example
[ceph: root@host01 /]# ceph-volume lvm list
[ceph: root@host01 /]# ceph-volume lvm list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Recreate the OSD with a specification file corresponding to that specific OSD topology:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the updated specification file:
Example
[ceph: root@host01 /]# ceph orch apply -i osd.yml Scheduled osd.osd_service update...
[ceph: root@host01 /]# ceph orch apply -i osd.yml Scheduled osd.osd_service update...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the OSD is back:
Example
[ceph: root@host01 /]# ceph -s [ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph -s [ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.13. Replacing the OSDs in a non-colocated scenario Copy linkLink copied to clipboard!
When the an OSD fails in a non-colocated scenario, you can replace the WAL/DB devices. The procedure is the same for DB and WAL devices. You need to edit the paths
under db_devices
for DB devices and paths
under wal_devices
for WAL devices.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Daemons are non-colocated.
- Failed OSD
Procedure
Identify the devices in the cluster:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the OSDs and their DB device:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the
osds.yaml
file, setunmanaged
parameter totrue
, elsecephadm
redeploys the OSDs:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the updated specification file:
Example
[ceph: root@host01 /]# ceph orch apply -i osds.yml Scheduled osd.non-colocated update...
[ceph: root@host01 /]# ceph orch apply -i osds.yml Scheduled osd.non-colocated update...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the OSDs. Ensure to use the
--zap
option to remove hte backend services and the--replace
option to retain the OSD IDs:Example
[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap --replace Scheduled OSD(s) for removal
[ceph: root@host01 /]# ceph orch osd rm 2 5 --zap --replace Scheduled OSD(s) for removal
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
osds.yaml
specification file to changeunmanaged
parameter tofalse
and replace the path to the DB device if it has changed after the device got physically replaced:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the above example,
/dev/sdh
is replaced with/dev/sde
.ImportantIf you use the same host specification file to replace the faulty DB device on a single OSD node, modify the
host_pattern
option to specify only the OSD node, else the deployment fails and you cannot find the new DB device on other hosts.Reapply the specification file with the
--dry-run
option to ensure the OSDs shall be deployed with the new DB device:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the specification file:
Example
[ceph: root@host01 /]# ceph orch apply -i osds.yml Scheduled osd.non-colocated update...
[ceph: root@host01 /]# ceph orch apply -i osds.yml Scheduled osd.non-colocated update...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the OSDs are redeployed:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
From the OSD host where the OSDS are redeployed, verify if they are on the new DB device:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.14. Stopping the removal of the OSDs using the Ceph Orchestrator Copy linkLink copied to clipboard!
You can stop the removal of only the OSDs that are queued for removal. This resets the initial state of the OSD and takes it off the removal queue.
If the OSD is in the process of removal, then you cannot stop the process.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- Monitor, Manager and OSD daemons are deployed on the cluster.
- Remove OSD process initiated.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the device and the node from which the OSD was initiated to be removed:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the removal of the queued OSD:
Syntax
ceph orch osd rm stop OSD_ID
ceph orch osd rm stop OSD_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch osd rm stop 0
[ceph: root@host01 /]# ceph orch osd rm stop 0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the OSD removal:
Example
[ceph: root@host01 /]# ceph orch osd rm status
[ceph: root@host01 /]# ceph orch osd rm status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the details of the devices and the nodes from which the Ceph OSDs were queued for removal:
Example
[ceph: root@host01 /]# ceph osd tree
[ceph: root@host01 /]# ceph osd tree
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.15. Activating the OSDs using the Ceph Orchestrator Copy linkLink copied to clipboard!
You can activate the OSDs in the cluster in cases where the operating system of the host was reinstalled.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Hosts are added to the cluster.
- Monitor, Manager and OSD daemons are deployed on the storage cluster.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the operating system of the host is reinstalled, activate the OSDs:
Syntax
ceph cephadm osd activate HOSTNAME
ceph cephadm osd activate HOSTNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph cephadm osd activate host03
[ceph: root@host01 /]# ceph cephadm osd activate host03
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
List the service:
Example
[ceph: root@host01 /]# ceph orch ls
[ceph: root@host01 /]# ceph orch ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the hosts, daemons, and processes:
Syntax
ceph orch ps --service_name=SERVICE_NAME
ceph orch ps --service_name=SERVICE_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch ps --service_name=osd
[ceph: root@host01 /]# ceph orch ps --service_name=osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.16. Observing the data migration Copy linkLink copied to clipboard!
When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or existing OSD(s). You can observe the data migration using ceph-w
command.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Recently added or removed an OSD.
Procedure
To observe the data migration:
Example
[ceph: root@host01 /]# ceph -w
[ceph: root@host01 /]# ceph -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Watch as the placement group states change from
active+clean
toactive, some degraded objects
, and finallyactive+clean
when migration completes. -
To exit the utility, press
Ctrl + C
.
6.17. Recalculating the placement groups Copy linkLink copied to clipboard!
Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is built upon the given redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded pools, the number of OSDs to use is defined by the number of chunks.
When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a tool is available to produce agile environments.
During the lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the pg_num
of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten times the number of OSDs, the number of PGs per OSD will go down to ten only. Because a small number of PGs per OSD will tend to an unevenly distributed capacity, consider adjusting the PGs per pool.
Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.
Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O queues and high latency rendering the storage cluster unusable or will result in long healing times.