Este contenido no está disponible en el idioma seleccionado.
Chapter 5. Generalized stretch cluster configuration for three availability zones
As a storage administrator, you can configure a generalized stretch cluster configuration for three availability zones with Ceph OSDs.
Ceph can withstand the loss of Ceph OSDs because of its network and cluster, which are equally reliable with failures randomly distributed across the CRUSH map. If a number of OSDs are shut down, the remaining OSDs and monitors still manage to operate.
Using a single cluster limits data availability to a single location with a single point of failure. However, in some situations, higher availability might be required. Using three availability zones allows the cluster to withstand power loss and even a full data center loss in the event of a natural disaster.
With a generalized stretch cluster configuration for three availability zones, three data centers are supported, with each site holding two copies of the data. This helps ensure that even during a data center outage, the data remains accessible and writeable from another site. With this configuration, the pool replication size is 6 and the pool min_size is 3.
The standard Ceph configuration survives many failures of the network or data centers and it never compromises data consistency. If you restore enough Ceph servers following a failure, it recovers. Ceph maintains availability if you lose a data center, but can still form a quorum of monitors and have all the data available with enough copies to satisfy pools’ min_size, or CRUSH rules that replicate again to meet the size.
5.1. Generalized stretch cluster deployment limitations Copiar enlaceEnlace copiado en el portapapeles!
When using generalized stretch clusters, the following limitations should be considered.
- Generalized stretch cluster configuration for three availability zones does not support I/O operations during a netsplit scenario between two or more zones. While the cluster remains accessible for basic Ceph commands, I/O usage remains unavailable until the netsplit is resolved. This is different from stretch mode, where the tiebreaker monitor can isolate one zone of the cluster and continue I/O operations in degraded mode during a netsplit. For more information about stretch mode, see Stretch mode for a storage cluster.
In a three availability zone configuration, Red Hat Ceph Storage is designed to tolerate multiple host failures. However, if more than 25% of the OSDs in the cluster go down, Ceph may stop marking OSDs as
out. This behavior is controlled by themon_osd_min_in_ratioparameter. By default,mon_osd_min_in_ratiois set to 0.75, meaning that at least 75% of the OSDs in the cluster must remainin(active) before any additional OSDs can be markedout. This setting prevents too many OSDs from being markedoutas this might lead to significant data movement. The data movement can cause high client I/O impact and long recovery times when the OSDs are returned to service.If Red Hat Ceph Storage stops marking OSDs as out, some placement groups (PGs) may fail to rebalance to surviving OSDs, potentially leading to inactive placement groups (PGs).
ImportantWhile adjusting the
mon_osd_min_in_ratiovalue can allow more OSDs to be marked out and trigger rebalancing, this should be done with caution. For more information on themon_osd_min_in_ratioparameter, see Ceph Monitor and OSD configuration options.
5.2. Generalized stretch cluster deployment requirements Copiar enlaceEnlace copiado en el portapapeles!
This information details important hardware, software, and network requirements that are needed for deploying a generalized stretch cluster configuration for three availability zones.
5.2.1. Hardware requirements Copiar enlaceEnlace copiado en el portapapeles!
Use the following minimum hardware requirements before deploying generalized stretch cluster configuration for three availability zones. The following table lists the physical server locations and Ceph component layout for an example three availability zone deployment.
| Host name | Datacenter | Ceph services |
|---|---|---|
| host01 | DC1 | OSD+MON+MGR |
| host02 | DC1 | OSD+MON+MGR+RGW |
| host03 | DC1 | OSD+MON+MDS |
| host04 | DC2 | OSD+MON+MGR |
| host05 | DC2 | OSD+MON+MGR+RGW |
| host06 | DC2 | OSD+MON+MDS |
| host07 | DC3 | OSD+MON+MGR |
| host08 | DC3 | OSD+MON+MGR+RGW |
| host09 | DC3 | OSD+MON+MDS |
5.2.2. Network configuration requirements Copiar enlaceEnlace copiado en el portapapeles!
Use the following network configuration requirements before deploying generalized stretch cluster configuration for three availability zones.
- Have two separate networks, one public network and one cluster network.
Have three different data centers that support VLANS and subnets for Ceph cluster and public networks for all data centers.
NoteYou can use different subnets for each of the data centers.
- The latencies between data centers running the Red Hat Ceph Storage Object Storage Devices (OSDs) cannot exceed 10 ms RTT.
For more information about network considerations, see Network considerations for Red Hat Ceph Storage in the Red Hat Ceph Storage Hardware Guide.
5.2.3. Cluster setup requirements Copiar enlaceEnlace copiado en el portapapeles!
Ensure that the hostname is configured by using the bare or short hostname in all hosts.
Syntax
hostnamectl set-hostname SHORT_NAME
hostnamectl set-hostname SHORT_NAME
The hostname command should only return the short hostname, when run on all nodes. If the FQDN is returned, the cluster configuration will not be successful.
5.3. Bootstrapping the Ceph cluster with a specification file Copiar enlaceEnlace copiado en el portapapeles!
Deploy the generalized stretch cluster by setting the CRUSH location to the daemons in the cluster with the spec configuration file.
Set the CRUSH location to the daemons in the cluster with a service configuration file. Use the configuration file to add the hosts to the proper locations during deployment.
For more information about Ceph bootstrapping and different cephadm bootstrap command options, see Bootstrapping a new storage cluster in the Red Hat Ceph Storage Installation Guide.
Run cephadm bootstrap on the node that you want to be the initial Monitor node in the cluster. The IP_ADDRESS option should be the IP address of the node you are using to run cephadm bootstrap.
- If the storage cluster includes multiple networks and interfaces, be sure to choose a network that is accessible by any node that uses the storage cluster.
-
To deploy a storage cluster by using IPV6 addresses, use the IPV6 address format for the
--mon-ip <IP_ADDRESS>option. For example:cephadm bootstrap --mon-ip 2620:52:0:880:225:90ff:fefc:2536 --registry-json /etc/mylogin.json. -
To route the internal cluster traffic over the public network, omit the
--cluster-network SUBNEToption.
Within this procedure the network Classless Inter-Domain Routing (CIDR) is referred to as subnet.
Prerequisites
Be sure that you have root-level access to the nodes.
Procedure
Create the service configuration YAML file. The YAML file adds the nodes to the Red Hat Ceph Storage cluster and also sets specific labels for where the services run. The following example depends on the specific OSD and Ceph Object Gateway (RGW) configuration that is needed.
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For more information about changing the custom spec for OSD and Object Gateway, see the following deployment instructions: * Deploying Ceph OSDs using advanced service specifications in the Red Hat Ceph Storage Operations Guide. * Deploying the Ceph Object Gateway using the service specification in the Red Hat Ceph Storage Object Gateway Guide.
Bootstrap the storage cluster with the
--apply-specoption.Syntax
cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORD
cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url registry.redhat.io --registry-username myuser1 --registry-password mypassword1
[root@host01 ~]# cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url registry.redhat.io --registry-username myuser1 --registry-password mypassword1Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantYou can use different command options with the
cephadm bootstrapcommand but always include the--apply-specoption to use the service configuration file and configure the host locations.Log into the
cephadmshell.Syntax
cephadm shell
cephadm shellCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
cephadm shell
[root@host01 ~]# cephadm shellCopy to Clipboard Copied! Toggle word wrap Toggle overflow Configure the public network with the subnet. For more information about configuring multiple public networks to the cluster, see Configuring multiple public networks to the cluster in the Red Hat Ceph Storage Configuration Guide.
Syntax
ceph config set global public_network "SUBNET_1,SUBNET_2, ..."
ceph config set global public_network "SUBNET_1,SUBNET_2, ..."Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph config global mon public_network "10.0.208.0/22,10.0.212.0/22,10.0.64.0/22,10.0.56.0/22"
[ceph: root@host01 /]# ceph config global mon public_network "10.0.208.0/22,10.0.212.0/22,10.0.64.0/22,10.0.56.0/22"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Configure a cluster network. For more information about configuring multiple cluster networks to the cluster, see Configuring a private networkin the Red Hat Ceph Storage Configuration Guide.
Syntax
ceph config set global cluster_network "SUBNET_1,SUBNET_2, ..."
ceph config set global cluster_network "SUBNET_1,SUBNET_2, ..."Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph config set global cluster_network "10.0.208.0/22,10.0.212.0/22,10.0.64.0/22,10.0.56.0/22"
[ceph: root@host01 /]# ceph config set global cluster_network "10.0.208.0/22,10.0.212.0/22,10.0.64.0/22,10.0.56.0/22"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Verify the network configurations.
Syntax
ceph config dump | grep network
ceph config dump | grep networkCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph config dump | grep network
[ceph: root@host01 /]# ceph config dump | grep networkCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the daemons. Ceph daemons bind dynamically, so you do not have to restart the entire cluster at once if you change the network configuration for a specific daemon.
Syntax
ceph orch restart mon
ceph orch restart monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: To restart the cluster on the admin node as a root user, run the
systemctl restartcommand.NoteTo get the FSID of the cluster, use the
ceph fsidcommand.Syntax
systemctl restart ceph-FSID_OF_CLUSTER.target
systemctl restart ceph-FSID_OF_CLUSTER.targetCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl restart ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad.target
[root@host01 ~]# systemctl restart ceph-1ca9f6a8-d036-11ec-8263-fa163ee967ad.targetCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the specification file details and that the bootstrap was installed successfully.
Verify that all hosts were placed in the expected data centers, as specified in step 1 of the procedure.
Syntax
ceph osd tree
ceph osd treeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that there are three data centers under root and that the hosts are placed in each of the expected data centers.
NoteThe hosts with OSDs will only be present after bootstrap if OSDs are deployed during bootstrap with the specification file.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - From the cephadm shell, verify that the mon daemons are deployed with CRUSH locations, as specified in step 1 of the procedure.
Syntax
ceph mon dump
ceph mon dump
+ Check that all mon daemons are in the output and that the correct CRUSH locations are added.
+ .Example --- [root@host01 ~]# ceph mon dump epoch 19 fsid b556497a-693a-11ef-b9d1-fa163e841fd7 last_changed 2024-09-03T12:47:08.419495+0000 created 2024-09-02T14:50:51.490781+0000 min_mon_release 19 (squid) election_strategy: 3 0: [v2:10.0.67.43:3300/0,v1:10.0.67.43:6789/0] mon.host01-installer; crush_location {datacenter=DC1} 1: [v2:10.0.67.20:3300/0,v1:10.0.67.20:6789/0] mon.host02; crush_location {datacenter=DC1} 2: [v2:10.0.64.242:3300/0,v1:10.0.64.242:6789/0] mon.host03; crush_location {datacenter=DC1} 3: [v2:10.0.66.17:3300/0,v1:10.0.66.17:6789/0] mon.host06; crush_location {datacenter=DC2} 4: [v2:10.0.66.228:3300/0,v1:10.0.66.228:6789/0] mon.host09; crush_location {datacenter=DC3} 5: [v2:10.0.65.125:3300/0,v1:10.0.65.125:6789/0] mon.host05; crush_location {datacenter=DC2} 6: [v2:10.0.66.252:3300/0,v1:10.0.66.252:6789/0] mon.host07; crush_location {datacenter=DC3} 7: [v2:10.0.64.145:3300/0,v1:10.0.64.145:6789/0] mon.host08; crush_location {datacenter=DC3} 8: [v2:10.0.64.125:3300/0,v1:10.0.64.125:6789/0] mon.host04; crush_location {datacenter=DC2} dumped monmap epoch 19 ---
Verify that the service spec and all location attributes are added correctly.
Check the service name for mon daemons on the cluster, by using the
ceph orch lscommand.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the mon daemon services, by using the
ceph orch ls mon --exportcommand.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
Verify that the bootstrap was installed successfully, by running the cephadm shell
ceph -scommand. For more information, see Verifying the cluster installation.
5.4. Enabling three availability zones on the pool Copiar enlaceEnlace copiado en el portapapeles!
Use this information to enable and integrate three availability zones within a generalized stretch cluster configuration.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place: * Root-level access to the nodes. * The CRUSH location is set to the hosts.
Procedure
Get the most recent CRUSH map and decompile the map into a text file.
Syntax
ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAME
ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txt
[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the new CRUSH rule into the decompiled CRUSH map file from the previous. In this example, the rule name is
3az_rule.Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow With this rule, the placement groups will be replicated with two copies in each of the three data centers.
Inject the CRUSH map to make the rule available to the cluster.
Syntax
crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAME
crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.bin
[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.binCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can verify that the rule was injected successfully, by using the following steps.
List the rules on the cluster.
Syntax
ceph osd crush rule ls
ceph osd crush rule lsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd crush rule ls replicated_rule ec86_pool 3az_rule
[ceph: root@host01 /]# ceph osd crush rule ls replicated_rule ec86_pool 3az_ruleCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Dump the CRUSH rule.
Syntax
ceph osd crush rule dump CRUSH_RULE
ceph osd crush rule dump CRUSH_RULECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the MON election strategy to connectivity.
Syntax
ceph mon set election_strategy connectivity
ceph mon set election_strategy connectivityCopy to Clipboard Copied! Toggle word wrap Toggle overflow When updated successfully, the election_strategy is updated to
3. The default election_strategy is1.Optional: Verify the election strategy that was set in the previous step.
Syntax
ceph mon dump
ceph mon dumpCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that all mon daemons are in the output and that the correct CRUSH locations are added.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the pool to associate with three availability zone stretch clusters. For more information about available pool values, see Pool values in the Red Hat Ceph Storage Storage Strategies Guide.
Syntax
ceph osd pool stretch set _POOL_NAME_ _PEERING_CRUSH_BUCKET_COUNT_ _PEERING_CRUSH_BUCKET_TARGET_ _PEERING_CRUSH_BUCKET_BARRIER_ _CRUSH_RULE_ _SIZE_ _MIN_SIZE_ [--yes-i-really-mean-it]
ceph osd pool stretch set _POOL_NAME_ _PEERING_CRUSH_BUCKET_COUNT_ _PEERING_CRUSH_BUCKET_TARGET_ _PEERING_CRUSH_BUCKET_BARRIER_ _CRUSH_RULE_ _SIZE_ _MIN_SIZE_ [--yes-i-really-mean-it]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace the variables as follows:
- POOL_NAME
- The name of the pool. It must be an existing pool, this command doesn’t create a new pool.
- PEERING_CRUSH_BUCKET_COUNT
- The value is used along with peering_crush_bucket_barrier to determined whether the set of OSDs in the chosen acting set can peer with each other, based on the number of distinct buckets there are in the acting set.
- PEERING_CRUSH_BUCKET_TARGET
- This value is used along with peering_crush_bucket_barrier and size to calculate the value bucket_max which limits the number of OSDs in the same bucket from getting chose to be in the acting set of a PG.
- PEERING_CRUSH_BUCKET_BARRIER
- The type of bucket a pool is stretched across. For example, rack, row, or datacenter.
- CRUSH_RULE
- The crush rule to use for the stretch pool. The type of pool must match the type of crush_rule (replicated or erasure).
- SIZE
- The number of replicas for objects in the stretch pool.
- MIN_SIZE
The minimum number of replicas required for I/O in the stretch pool.
ImportantThe `--yes-i-really-mean-it flag is required when setting the PEERING_CRUSH_BUCKET_COUNT and PEERING_CRUSH_BUCKET_TARGET to be more than the number of buckets in the CRUSH map. Use the optional flag to confirm that you want to bypass the safety checks and set the values for a stretch pool.
Example
[ceph: root@host01 /]# ceph osd pool stretch set pool01 2 3 datacenter 3az_rule 6 3
[ceph: root@host01 /]# ceph osd pool stretch set pool01 2 3 datacenter 3az_rule 6 3Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo revert a pool to a nonstretched cluster, use the
ceph osd pool stretch unset POOL_NAMEcommand. Using this command does not unset thecrush_rule,size, andmin_sizevalues. If needed, these need to be reset manually.A success message is emitted that the pool stretch values were set correctly.
Optional: Verify the pools associated with the stretch clusters, by using the
ceph osd pool stretch showcommands.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.5. Adding OSD hosts with three availability zones Copiar enlaceEnlace copiado en el portapapeles!
You can add Ceph OSDs with three availability zones on a generalized stretch cluster. The procedure is similar to the addition of the OSD hosts on a cluster where a generalized stretch cluster is not enabled. For more information, see Adding OSDs in the Red Hat Ceph Storage Installing Guide.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place: * A running Red Hat Ceph Storage cluster. * Three availability zones enabled on a cluster. For more information, see _Enabling three availability zones on the pool. * Root-level access to the nodes.
Procedure
From the node that contains the admin keyring, install the storage cluster’s public SSH key in the root user’s
authorized_keysfile on the new host.Syntax
ssh-copy-id -f -i /etc/ceph/ceph.pub user@NEWHOST
ssh-copy-id -f -i /etc/ceph/ceph.pub user@NEWHOSTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host10 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host11 [ceph: root@host10 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host12
[ceph: root@host10 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host11 [ceph: root@host10 /]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@host12Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Optional: Verify the status of the storage cluster and that each new host has been added by using the
ceph orch host lscommand. See that the new host has been added and that the Status of each host is blank in the output. List the available devices to deploy OSDs.
Deploy in one of the following ways:
Create an OSD from a specific device on a specific host.
Syntax
ceph orch daemon add osd _HOST_:_DEVICE_PATH_
ceph orch daemon add osd _HOST_:_DEVICE_PATH_Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host10 /]# ceph orch daemon add osd host11:/dev/sdb
[ceph: root@host10 /]# ceph orch daemon add osd host11:/dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy OSDs on any available and unused devices.
ImportantThis command creates collocated WAL and DB devices. If you want to create non-collocated devices, do not use this command.
Syntax
ceph orch apply osd --all-available-devices
ceph orch apply osd --all-available-devicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Move the OSD hosts under the CRUSH bucket.
Syntax
ceph osd crush move HOST datacenter=DATACENTER
ceph osd crush move HOST datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host10 /]# ceph osd crush move host10 datacenter=DC1 [ceph: root@host10 /]# ceph osd crush move host11 datacenter=DC2 [ceph: root@host10 /]# ceph osd crush move host12 datacenter=DC3
[ceph: root@host10 /]# ceph osd crush move host10 datacenter=DC1 [ceph: root@host10 /]# ceph osd crush move host11 datacenter=DC2 [ceph: root@host10 /]# ceph osd crush move host12 datacenter=DC3Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteEnsure you add the same topology nodes on all sites. Issues might arise if hosts are added only on one site.
Verification
Verify that all hosts are moved to the assigned data centers, by using the ceph osd tree command.