Questo contenuto non è disponibile nella lingua selezionata.
Chapter 4. Stretch clusters for Ceph storage
As a storage administrator, you can configure a two-site stretched cluster by enabling stretch mode in Ceph.
Red Hat Ceph Storage systems offer the option to expand the failure domain beyond the OSD level to a datacenter or cloud zone level.
The following diagram depicts a simplified representation of a Ceph cluster operating in stretch mode, where the tiebreaker host is provisioned in data center (DC) 3.
Figure 4.1. Stretch clusters for Ceph storage
A stretch cluster operates over a Wide Area Network (WAN), unlike a typical Ceph cluster, which operates over a Local Area Network (LAN). For illustration purposes, a data center is chosen as the failure domain, though this could also represent a cloud availability zone. Data Center 1 (DC1) and Data Center 2 (DC2) contain OSDs and Monitors within their respective domains, while Data Center 3 (DC3) contains only a single monitor. The latency between DC1 and DC2 should not exceed 10 ms RTT, as higher latency can significantly impact Ceph performance in terms of replication, recovery, and related operations. However, DC3—a non-data site typically hosted on a virtual machine—can tolerate higher latency compared to the two data sites. A stretch cluster, like the one in the diagram, can withstand a complete data center failure or a network partition between data centers as long as at least two sites remain connected.
A stretch cluster, like the one in the diagram, can withstand a complete data center failure or a network partition between data centers as long as at least two sites remain connected.
There are no additional steps to power down a stretch cluster. You can see the Powering down and rebooting Red Hat Ceph Storage cluster for more information.
4.1. Stretch mode for a storage cluster Copia collegamentoCollegamento copiato negli appunti!
To improve availability in Stretched clusters (geographically distributed deployments), you must enter the stretch mode. When stretch mode is enabled, the Ceph OSDs only take placement groups (PGs) as active when they peer across data centers, or whichever other CRUSH bucket type you specified, assuming both are active. Pools increase in size from the default three to four, with two copies on each site.
In stretch mode, Ceph OSDs are only allowed to connect to monitors within the same data center. New monitors are not allowed to join the cluster without specified location.
If all the OSDs and monitors from a data center become inaccessible at once, the surviving data center will enter a degraded stretch mode. This issues a warning, reduces the min_size to 1, and allows the cluster to reach an active state with the data from the remaining site.
Stretch mode is designed to handle netsplit scenarios between two data centers and the loss of one data center. Stretch mode handles the netsplit scenario by choosing the surviving data center with a better connection to the tiebreaker monitor. Stretch mode handles the loss of one data center by reducing the min_size of all pools to 1, allowing the cluster to continue operating with the remaining data center. When the lost data center comes back, the cluster will recover the lost data and return to normal operation.
In a stretch cluster, when a site goes down and the cluster enters a degraded state, the min_size of the pool may be temporarily reduced (e.g., to 1) to allow the placement groups (PGs) to become active and continue serving I/O. However, the size of the pool remains unchanged. The peering_crush_bucket_count stretch mode flag ensures that PGs does not become active unless they are backed by OSDs in a minimum number of distinct CRUSH buckets (e.g., different data centers). This mechanism prevents the system from creating redundant copies solely within the surviving site, ensuring that data is only fully replicated once the downed site recovers.
In a stretch cluster, when a site goes down and the cluster enters a degraded state, the min_size of the pool may be temporarily reduced (e.g., to 1) to allow the placement groups (PGs) to become active and continue serving I/O. However, the size of the pool remains unchanged. The peering_crush_bucket_count stretch mode flag ensures that PGs does not become active unless they are backed by OSDs in a minimum number of distinct CRUSH buckets (e.g., different data centers). This mechanism prevents the system from creating redundant copies solely within the surviving site, ensuring that data is only fully replicated once the downed site recovers.
When the missing data center becomes accessible again, the cluster enters recovery stretch mode. This changes the warning and allows peering, but still requires only the OSDs from the data center, which was up the whole time.
When all PGs are in a known state and are not degraded or incomplete, the cluster goes back to the regular stretch mode, ends the warning, and restores min_size to its starting value 2. The cluster again requires both sites to peer, not only the site that stayed up the whole time, therefore you can fail over to the other site, if necessary.
Stretch mode limitations
- It is not possible to exit from stretch mode once it is entered.
- You cannot use erasure-coded pools with clusters in stretch mode. You can neither enter the stretch mode with erasure-coded pools, nor create an erasure-coded pool when the stretch mode is active.
Device class is not supported in stretch mode. In the following example, the
class hddis not supported.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To achieve same weights on both sites, the Ceph OSDs deployed in the two sites should be of equal size, that is, storage capacity in the first site is equivalent to storage capacity in the second site.
- While it is not enforced, you should run two Ceph monitors on each site and a tiebreaker, for a total of five. This is because OSDs can only connect to monitors in their own site when in stretch mode.
- You have to create your own CRUSH rule, which provides two copies on each site, which totals to four on both sites.
-
You cannot enable stretch mode if you have existing pools with non-default size or
min_size. -
Because the cluster runs with
min_size 1when degraded, you should only use stretch mode with all-flash OSDs. This minimizes the time needed to recover once connectivity is restored, and minimizes the potential for data loss.
Stretch peering rule
In Ceph stretch cluster mode, a critical safeguard is enforced through the stretch peering rule, which ensures that a Placement Group (PG) cannot become active if all acting replicas reside within a single failure domain, such as a single data center or cloud availability zone.
This behavior is essential for protecting data integrity during site failures. If a PG were allowed to go active with all replicas confined to one site, write operations could be falsely acknowledged without true redundancy. In the event of a site outage, this would result in complete data loss for those PGs. By enforcing zone diversity in the acting set, Ceph stretch clusters maintain high availability while minimizing the risk of data inconsistency or loss.
4.2. Deployment requirements Copia collegamentoCollegamento copiato negli appunti!
This information details important hardware, software, and network requirements that are needed for deploying a generalized stretch cluster configuration for three availability zones.
Software requirements
Red Hat Ceph Storage 8.1
Hardware requirements
Use the following minimum requirements before a stretch cluster configuration.
| Hardware criteria | Minimum and recommended |
|---|---|
| Processor |
|
| RAM |
|
| Network | A single 1 Gb/s (bonded 10+ Gb/s recommended). |
| Hardware criteria | Minimum and recommended |
|---|---|
| Processor | 2 cores minimum |
| Storage drives | 100 GB per daemon. SSD is recommended. |
| Network | A single 1 Gb/s (10+ Gb/s recommended) |
| Hardware criteria | Minimum and recommended |
|---|---|
| Processor | 2 cores minimum |
| RAM | 2 GB per daemon (more for production) |
| Disk space | 1 GB per daemon |
| Network | A single 1 Gb/s (10+ Gb/s recommended) |
Daemon placement
The following table lists the daemon placement details across various hosts and data centers.
| Hostname | Data center | Services |
|---|---|---|
| host01 | DC1 | OSD+MON+MGR |
| host02 | DC1 | OSD+MON+MGR |
| host03 | DC1 | OSD+MDS+RGW |
| host04 | DC2 | OSD+MON+MGR |
| host05 | DC2 | OSD+MON+MGR |
| host06 | DC2 | OSD+MDS+RGW |
| host07 | DC3 (Tiebreaker) | MON |
Network configuration requirements
Use the following network configuration requirements before deploying stretch cluster configuration.
You can use different subnets for each of the data centers.
- Have two separate networks, one public network and one cluster network.
- The latencies between data centers that run the Ceph Object Storage Devices (OSDs) cannot exceed 10 ms RTT.
The following is an example of a basic network configuration:
DC1
Ceph public/private network: 10.0.40.0/24
DC2
Ceph public/private network: 10.0.40.0/24
Tiebreaker
Ceph public/private network: 10.0.40.0/24
Cluster setup requirements
Ensure that the hostname is configured by using the bare or short hostname in all hosts.
Syntax
hostnamectl set-hostname SHORT_NAME
hostnamectl set-hostname SHORT_NAME
The hostname command should only return the short hostname, when run on all nodes. If the FQDN is returned, the cluster configuration will not be successful.
4.3. Setting the CRUSH location for the daemons Copia collegamentoCollegamento copiato negli appunti!
Before you enter the stretch mode, you need to prepare the cluster by setting the CRUSH location to the daemons in the Red Hat Ceph Storage cluster. There are two ways to do this:
- Bootstrap the cluster through a service configuration file, where the locations are added to the hosts as part of deployment.
-
Set the locations manually through
ceph osd crush add-bucketandceph osd crush movecommands after the cluster is deployed.
Method 1: Bootstrapping the cluster
Prerequisites
- Root-level access to the nodes.
Procedure
If you are bootstrapping your new storage cluster, you can create the service configuration
.yamlfile that adds the nodes to the Red Hat Ceph Storage cluster and also sets specific labels for where the services should run:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Bootstrap the storage cluster with the
--apply-specoption:Syntax
cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORD
cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url registry.redhat.io --registry-username myuser1 --registry-password mypassword1
[root@host01 ~]# cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url registry.redhat.io --registry-username myuser1 --registry-password mypassword1Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantYou can use different command options with the
cephadm bootstrapcommand. However, always include the--apply-specoption to use the service configuration file and configure the host locations.
Method 2: Setting the locations after the deployment
Prerequisites
- Root-level access to the nodes.
Procedure
Add two buckets to which you plan to set the location of your non-tiebreaker monitors to the CRUSH map, specifying the bucket type as as
datacenter:Syntax
ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPE
ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd crush add-bucket DC1 datacenter [ceph: root@host01 /]# ceph osd crush add-bucket DC2 datacenter
[ceph: root@host01 /]# ceph osd crush add-bucket DC1 datacenter [ceph: root@host01 /]# ceph osd crush add-bucket DC2 datacenterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Move the buckets under
root=default:Syntax
ceph osd crush move BUCKET_NAME root=default
ceph osd crush move BUCKET_NAME root=defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd crush move DC1 root=default [ceph: root@host01 /]# ceph osd crush move DC2 root=default
[ceph: root@host01 /]# ceph osd crush move DC1 root=default [ceph: root@host01 /]# ceph osd crush move DC2 root=defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Move the OSD hosts according to the required CRUSH placement:
Syntax
ceph osd crush move HOST datacenter=DATACENTER
ceph osd crush move HOST datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd crush move host01 datacenter=DC1
[ceph: root@host01 /]# ceph osd crush move host01 datacenter=DC1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3.1. Entering the stretch mode Copia collegamentoCollegamento copiato negli appunti!
The new stretch mode is designed to handle two sites. There is a lower risk of component availability outages with 2-site clusters.
Prerequisites
- Root-level access to the nodes.
- The CRUSH location is set to the hosts.
Procedure
Set the location of each monitor, matching your CRUSH map:
Syntax
ceph mon set_location HOST datacenter=DATACENTER
ceph mon set_location HOST datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_location host01 datacenter=DC1 [ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC1 [ceph: root@host01 /]# ceph mon set_location host04 datacenter=DC2 [ceph: root@host01 /]# ceph mon set_location host05 datacenter=DC2 [ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3
[ceph: root@host01 /]# ceph mon set_location host01 datacenter=DC1 [ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC1 [ceph: root@host01 /]# ceph mon set_location host04 datacenter=DC2 [ceph: root@host01 /]# ceph mon set_location host05 datacenter=DC2 [ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate a CRUSH rule which places two copies on each data center:
Syntax
ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAME
ceph osd getcrushmap > COMPILED_CRUSHMAP_FILENAME crushtool -d COMPILED_CRUSHMAP_FILENAME -o DECOMPILED_CRUSHMAP_FILENAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txt
[ceph: root@host01 /]# ceph osd getcrushmap > crush.map.bin [ceph: root@host01 /]# crushtool -d crush.map.bin -o crush.map.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the decompiled CRUSH map file to add a new rule:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis rule makes the cluster have read-affinity towards data center
DC1. Therefore, all the reads or writes happen through Ceph OSDs placed inDC1.If this is not desirable, and reads or writes are to be distributed evenly across the zones, the CRUSH rule is the following:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this rule, the data center is selected randomly and automatically.
See CRUSH rules for more information on
firstnandindepoptions.
Inject the CRUSH map to make the rule available to the cluster:
Syntax
crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAME
crushtool -c DECOMPILED_CRUSHMAP_FILENAME -o COMPILED_CRUSHMAP_FILENAME ceph osd setcrushmap -i COMPILED_CRUSHMAP_FILENAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.bin
[ceph: root@host01 /]# crushtool -c crush.map.txt -o crush2.map.bin [ceph: root@host01 /]# ceph osd setcrushmap -i crush2.map.binCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you do not run the monitors in connectivity mode, set the election strategy to
connectivity:Example
[ceph: root@host01 /]# ceph mon set election_strategy connectivity
[ceph: root@host01 /]# ceph mon set election_strategy connectivityCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enter stretch mode by setting the location of the tiebreaker monitor to split across the data centers:
Syntax
ceph mon set_location HOST datacenter=DATACENTER ceph mon enable_stretch_mode HOST stretch_rule datacenter
ceph mon set_location HOST datacenter=DATACENTER ceph mon enable_stretch_mode HOST stretch_rule datacenterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3 [ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenter
[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3 [ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenterCopy to Clipboard Copied! Toggle word wrap Toggle overflow In this example the monitor
mon.host07is the tiebreaker.ImportantThe location of the tiebreaker monitor should differ from the data centers to which you previously set the non-tiebreaker monitors. In the example above, it is data center
DC3.ImportantDo not add this data center to the CRUSH map as it results in the following error when you try to enter stretch mode:
Error EINVAL: there are 3 datacenters in the cluster but stretch mode currently only works with 2!
Error EINVAL: there are 3 datacenters in the cluster but stretch mode currently only works with 2!Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are writing your own tooling for deploying Ceph, you can use a new
--set-crush-locationoption when booting monitors, instead of running theceph mon set_locationcommand. This option accepts only a singlebucket=locationpair, for exampleceph-mon --set-crush-location 'datacenter=DC1', which must match the bucket type you specified when running theenable_stretch_modecommand.Verify that the stretch mode is enabled successfully:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
stretch_mode_enabledshould be set totrue. You can also see the number of stretch buckets, stretch mode buckets, and if the stretch mode is degraded or recovering.Verify that the monitors are in an appropriate locations:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can also see which monitor is the tiebreaker, and the monitor election strategy.
4.3.2. Configuring a CRUSH map for stretch mode Copia collegamentoCollegamento copiato negli appunti!
Use this information to configure a CRUSH map for stretch mode.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place:
- Root-level access to the nodes.
- The CRUSH location is set to the hosts.
Procedure
Create a CRUSH rule that makes use of this OSD crush topology by installing the ceph-base RPM package in order to use the
crushtoolcommand.Syntax
dnf -y install ceph-base
dnf -y install ceph-baseCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the compiled CRUSH map from the cluster.
Syntax
ceph osd getcrushmap > /etc/ceph/crushmap.bin
ceph osd getcrushmap > /etc/ceph/crushmap.binCopy to Clipboard Copied! Toggle word wrap Toggle overflow Decompile the CRUSH map and convert it to a text file to edit it.
Syntax
crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txt
crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following rule to the CRUSH map by editing the
/etc/ceph/crushmap.txtat the end of the file. This rule distributes reads and writes evenly across the data center.Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optionally have the cluster with a read/write affinity towards data center 1.
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Compile the new CRUSH map from
/etc/ceph/crushmap.txtand convert it to a binary file/etc/ceph/crushmap2.bin.Syntax
crushtool -c /path/to/crushmap.txt -o /path/to/crushmap2.bin
crushtool -c /path/to/crushmap.txt -o /path/to/crushmap2.binCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.bin
[ceph: root@host01 /]# crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.binCopy to Clipboard Copied! Toggle word wrap Toggle overflow Inject the newly created CRUSH map back into the cluster.
Syntax
ceph osd setcrushmap -i /path/to/compiled_crushmap
ceph osd setcrushmap -i /path/to/compiled_crushmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd setcrushmap -i /path/to/compiled_crushmap 17
[ceph: root@host01 /]# ceph osd setcrushmap -i /path/to/compiled_crushmap 17Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe number 17 is a counter and increases (18,19, and so on) depending on the changes that are made to the CRUSH map.
Verifying
Verify that the newly created stretch_rule available for use.
Syntax
ceph osd crush rule ls
ceph osd crush rule ls
Example
[ceph: root@host01 /]# ceph osd crush rule ls replicated_rule stretch_rule
[ceph: root@host01 /]# ceph osd crush rule ls
replicated_rule
stretch_rule
4.3.2.1. Entering stretch mode Copia collegamentoCollegamento copiato negli appunti!
Stretch mode is designed to handle two sites. There is a lesser risk of component availability outages with 2-site clusters.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place:
- Root-level access to the nodes.
- The CRUSH location is set to the hosts.
- The CRUSH map configured to include stretch rule.
- No erasure coded pools in the cluster.
- Weights of the two sites are the same.
Procedure
Check the current election strategy being used by the monitors.
Syntax
ceph mon dump | grep election_strategy
ceph mon dump | grep election_strategyCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe Ceph cluster
election_strategyis set to1, by default.Example
[ceph: root@host01 /]# ceph mon dump | grep election_strategy dumped monmap epoch 9 election_strategy: 1
[ceph: root@host01 /]# ceph mon dump | grep election_strategy dumped monmap epoch 9 election_strategy: 1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change the election strategy to
connectivity.Syntax
ceph mon set election_strategy connectivity
ceph mon set election_strategy connectivityCopy to Clipboard Copied! Toggle word wrap Toggle overflow For more information about configuring the election strategy, see Configuring monitor election strategy.
Use the
ceph mon dumpcommand to verify that the election strategy was updated to3.Example
[ceph: root@host01 /]# ceph mon dump | grep election_strategy dumped monmap epoch 22 election_strategy: 3
[ceph: root@host01 /]# ceph mon dump | grep election_strategy dumped monmap epoch 22 election_strategy: 3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the location of the tiebreaker monitor so that it is split across the data centers.
Syntax
ceph mon set_location TIEBREAKER_HOST datacenter=DC3
ceph mon set_location TIEBREAKER_HOST datacenter=DC3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3
[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the tiebreaker monitor is set as expected.
Syntax
ceph mon dump
ceph mon dumpCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enter stretch mode.
Syntax
ceph mon enable_stretch_mode TIEBREAKER_HOST STRETCH_RULE STRETCH_BUCKET
ceph mon enable_stretch_mode TIEBREAKER_HOST STRETCH_RULE STRETCH_BUCKETCopy to Clipboard Copied! Toggle word wrap Toggle overflow In the following example:
- The tiebreaker node is set as host07.
- The stretch rule is stretch_rule, as created in .
- The stretch bucket is set as datacenter.
[ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenter
[ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenter
Verifying
Verify that stretch mode was implemented correctly by continuing to CROSREF.
4.3.2.2. Verifying stretch mode Copia collegamentoCollegamento copiato negli appunti!
Use this information to verify that stretch mode was created correctly with the implemented CRUSH rules.
Procedure
Verify that all pools are using the CRUSH rule that was created in the Ceph cluster. In these examples, the CRUSH rule is set as
stretch_rule, per the settings that were created in Configuring a CRUSH map for stretch mode.Syntax
for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;donefor pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that stretch mode is enabled. Ensure that
stretch_mode_enabledis set totrue.Syntax
ceph osd dump
ceph osd dumpCopy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes the following information:
- stretch_mode_enabled
-
Set to
trueif stretch mode is enabled. - stretch_bucket_count
- The number of data centers with OSDs.
- degraded_stretch_mode
-
Output of
0if not degraded. If the stretch mode is degraded, this outputs the number of up sites. - recovering_stretch_mode
-
Output of
0if not recovering. If the stretch mode is recovering, the output is1. - stretch_mode_bucket
A unique value set for each CRUSH bucket type. This value is usually set to
8, for data center.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify that stretch mode is using the mon map, by using the
ceph mon dump.Ensure the following:
-
stretch_mode_enabledis set to1 -
The correct mon host is set as
tiebreaker_mon The correct mon host is set as
disallowed_leadersSyntax
ceph mon dump
ceph mon dumpCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
What to do next
- Deploy, configure, and administer a Ceph Object Gateway. For more information, see Ceph Object Gateway.
- Manage, create, configure, and use Ceph Block Devices. For more information, see Ceph block devices.
- Create, mount, and work the Ceph File System (CephFS). For more information, see Ceph File Systems.
4.4. Using and maintaining stretch mode Copia collegamentoCollegamento copiato negli appunti!
Use and maintain stretch mode by adding OSD hosts, managing data center monitor service hosts, and replacing tiebreakers with a monitor both with and without a quorum.
4.4.1. Adding OSD hosts in stretch mode Copia collegamentoCollegamento copiato negli appunti!
You can add Ceph OSDs in the stretch mode. The procedure is similar to the addition of the OSD hosts on a cluster where stretch mode is not enabled.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Stretch mode in enabled on a cluster.
- Root-level access to the nodes.
Procedure
List the available devices to deploy OSDs:
Syntax
ceph orch device ls [--hostname=HOST_1 HOST_2] [--wide] [--refresh]
ceph orch device ls [--hostname=HOST_1 HOST_2] [--wide] [--refresh]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch device ls
[ceph: root@host01 /]# ceph orch device lsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the OSDs on specific hosts or on all the available devices:
Create an OSD from a specific device on a specific host:
Syntax
ceph orch daemon add osd HOST:DEVICE_PATH
ceph orch daemon add osd HOST:DEVICE_PATHCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon add osd host03:/dev/sdb
[ceph: root@host01 /]# ceph orch daemon add osd host03:/dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy OSDs on any available and unused devices:
ImportantThis command creates collocated WAL and DB devices. If you want to create non-collocated devices, do not use this command.
Example
[ceph: root@host01 /]# ceph orch apply osd --all-available-devices
[ceph: root@host01 /]# ceph orch apply osd --all-available-devicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Move the OSD hosts under the CRUSH bucket:
Syntax
ceph osd crush move HOST datacenter=DATACENTER
ceph osd crush move HOST datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph osd crush move host03 datacenter=DC1 [ceph: root@host01 /]# ceph osd crush move host06 datacenter=DC2
[ceph: root@host01 /]# ceph osd crush move host03 datacenter=DC1 [ceph: root@host01 /]# ceph osd crush move host06 datacenter=DC2Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteEnsure you add the same topology nodes on both sites. Issues might arise if hosts are added only on one site.
4.4.2. Managing data center monitor service hosts in stretch mode Copia collegamentoCollegamento copiato negli appunti!
Use this information to add and remove data center monitor service (mon) hosts in stretch mode. Managing data centers can be done by using the specification file or directly on the Ceph cluster.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place:
- A running Red Hat Ceph Storage cluster
- Stretch mode in enabled on a cluster
- Root-level access to the nodes.
4.4.2.1. Managing a mon service with a service specification file Copia collegamentoCollegamento copiato negli appunti!
These steps detail how to add a mon service. To remove the service, use the same steps of updating the service specification file, with removing the needed information.
Procedure
Export the specification file for mon and save the output to
mon-spec.yaml.Syntax
ceph orch ls mon --export > mon-spec.yaml
ceph orch ls mon --export > mon-spec.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow After the file is exported, the YAML file can be edited.
Add the new host details. In the following example,
host08is being added to the cluster into the DC2 data center bucket.Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the specification file.
Syntax
ceph orch apply -i mon-spec.yaml
ceph orch apply -i mon-spec.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# eph orch apply -i mon-spec.yaml Added host 'host08' with addr '10.1.172.225' Scheduled mon update...
[ceph: root@host01 /]# eph orch apply -i mon-spec.yaml Added host 'host08' with addr '10.1.172.225' Scheduled mon update...Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verifying
Use the
ceph mon dumpcommand to verify that themonservice was deployed and that the appropriate CRUSH location was added to the monitor.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
ceph orch host lsto verify that the host was added to the cluster.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.2.2. Managing a mon service with the command-line interface Copia collegamentoCollegamento copiato negli appunti!
These steps detail how to add a mon service. To remove the service, use the same steps of updating with the CLI, with removing the needed information.
Procedure
Set the monitor service to
unmanaged.Syntax
ceph orch set-unmanaged mon
ceph orch set-unmanaged monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Use the
ceph orch lscommand to verify that the service was set, as expected.Example
[ceph: root@host01 /]# ceph orch host ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mon 8/8 10m ago 19s <unmanaged>
[ceph: root@host01 /]# ceph orch host ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mon 8/8 10m ago 19s <unmanaged>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add a new host with the
monlabel.Syntax
ceph orch host add HOST_NAME IP_ADDRESS_OF_HOST [--label=LABEL_NAME_1,LABEL_NAME_2]
ceph orch host add HOST_NAME IP_ADDRESS_OF_HOST [--label=LABEL_NAME_1,LABEL_NAME_2]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch host add host08 10.1.172.205 --labels=mon
[ceph: root@host01 /]# ceph orch host add host08 10.1.172.205 --labels=monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add a monitor service with CRUSH locations.
NoteAt this point, the mon is not running and is not managed by Cephadm.
Syntax
ceph mon add NODE:_IP_ADDRESS_ datacenter=DC2
ceph mon add NODE:_IP_ADDRESS_ datacenter=DC2Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon add host08:10.1.172.205 datacenter=DC2
[ceph: root@host01 /]# ceph mon add host08:10.1.172.205 datacenter=DC2Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the monitor daemon using Cephadm.
Syntax
ceph orch daemon add mon host08
ceph orch daemon add mon host08Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon add mon host08 Deployed mon.host08 on host 'host08'
[ceph: root@host01 /]# ceph orch daemon add mon host08 Deployed mon.host08 on host 'host08'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable Cephadm management for the monitor service.
Syntax
ceph orch set-managed mon
ceph orch set-managed monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the newly added
mondaemon.Syntax
ceph orch set-managed mgr
ceph orch set-managed mgrCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verifying
Verify that the service, monitor, and host are added and running.
Use the
ceph orch lscommand to verify that the service is running.Example
[ceph: root@host01 /]# ceph orch host ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mon 8/8 7m ago 4d label:mon
[ceph: root@host01 /]# ceph orch host ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT mon 8/8 7m ago 4d label:monCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
ceph mon dumpcommand to verify that themonservice was deployed and that the appropriate CRUSH location was added to the monitor.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
ceph orch host lscommmand to verify that the host was added to the cluster.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.3. Replacing the tiebreaker with a monitor in quorum Copia collegamentoCollegamento copiato negli appunti!
If your tiebreaker monitor fails, you can replace it with an existing monitor in quorum and remove it from the cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster
- Stretch mode is enabled on a cluster
Procedure
Disable automated monitor deployment:
Example
[ceph: root@host01 /]# ceph orch apply mon --unmanaged Scheduled mon update…
[ceph: root@host01 /]# ceph orch apply mon --unmanaged Scheduled mon update…Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the monitors in quorum:
Example
[ceph: root@host01 /]# ceph -s mon: 5 daemons, quorum host01, host02, host04, host05 (age 30s), out of quorum: host07
[ceph: root@host01 /]# ceph -s mon: 5 daemons, quorum host01, host02, host04, host05 (age 30s), out of quorum: host07Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the monitor in quorum as a new tiebreaker:
Syntax
ceph mon set_new_tiebreaker NEW_HOST
ceph mon set_new_tiebreaker NEW_HOSTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantYou get an error message if the monitor is in the same location as existing non-tiebreaker monitors:
Example
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02 Error EINVAL: mon.host02 has location DC1, which matches mons host02 on the datacenter dividing bucket for stretch mode.
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02 Error EINVAL: mon.host02 has location DC1, which matches mons host02 on the datacenter dividing bucket for stretch mode.Copy to Clipboard Copied! Toggle word wrap Toggle overflow If that happens, change the location of the monitor:
Syntax
ceph mon set_location HOST datacenter=DATACENTER
ceph mon set_location HOST datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC3
[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the failed tiebreaker monitor:
Syntax
ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force
ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --forceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force Removed mon.host07 from host 'host07'
[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force Removed mon.host07 from host 'host07'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the monitor is removed from the host, redeploy the monitor:
Syntax
ceph mon add HOST IP_ADDRESS datacenter=DATACENTER ceph orch daemon add mon HOST
ceph mon add HOST IP_ADDRESS datacenter=DATACENTER ceph orch daemon add mon HOSTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon add host07 213.222.226.50 datacenter=DC1 [ceph: root@host01 /]# ceph orch daemon add mon host07
[ceph: root@host01 /]# ceph mon add host07 213.222.226.50 datacenter=DC1 [ceph: root@host01 /]# ceph orch daemon add mon host07Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure there are five monitors in quorum:
Example
[ceph: root@host01 /]# ceph -s mon: 5 daemons, quorum host01, host02, host04, host05, host07 (age 15s)
[ceph: root@host01 /]# ceph -s mon: 5 daemons, quorum host01, host02, host04, host05, host07 (age 15s)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that everything is configured properly:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Redeploy the monitors:
Syntax
ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”
ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host07" Scheduled mon update...
[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host07" Scheduled mon update...Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.4. Replacing the tiebreaker with a new monitor Copia collegamentoCollegamento copiato negli appunti!
If your tiebreaker monitor fails, you can replace it with a new monitor and remove it from the cluster.
Prerequisites
Before you begin, make sure that you have the following prerequisites in place:
- A running Red Hat Ceph Storage cluster
- Stretch mode in enabled on a cluster
Procedure
Add a new monitor to the cluster:
Manually add the
crush_locationto the new monitor:Syntax
ceph mon add NEW_HOST IP_ADDRESS datacenter=DATACENTER
ceph mon add NEW_HOST IP_ADDRESS datacenter=DATACENTERCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon add host06 213.222.226.50 datacenter=DC3 adding mon.host06 at [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0]
[ceph: root@host01 /]# ceph mon add host06 213.222.226.50 datacenter=DC3 adding mon.host06 at [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0]Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe new monitor has to be in a different location than existing non-tiebreaker monitors.
Disable automated monitor deployment:
Example
[ceph: root@host01 /]# ceph orch apply mon --unmanaged Scheduled mon update…
[ceph: root@host01 /]# ceph orch apply mon --unmanaged Scheduled mon update…Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the new monitor:
Syntax
ceph orch daemon add mon NEW_HOST
ceph orch daemon add mon NEW_HOSTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon add mon host06
[ceph: root@host01 /]# ceph orch daemon add mon host06Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Ensure there are 6 monitors, from which 5 are in quorum:
Example
[ceph: root@host01 /]# ceph -s mon: 6 daemons, quorum host01, host02, host04, host05, host06 (age 30s), out of quorum: host07
[ceph: root@host01 /]# ceph -s mon: 6 daemons, quorum host01, host02, host04, host05, host06 (age 30s), out of quorum: host07Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the new monitor as a new tiebreaker:
Syntax
ceph mon set_new_tiebreaker NEW_HOST
ceph mon set_new_tiebreaker NEW_HOSTCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host06
[ceph: root@host01 /]# ceph mon set_new_tiebreaker host06Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the failed tiebreaker monitor:
Syntax
ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force
ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --forceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force Removed mon.host07 from host 'host07'
[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force Removed mon.host07 from host 'host07'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that everything is configured properly:
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Redeploy the monitors:
Syntax
ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”
ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host06" Scheduled mon update…
[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host06" Scheduled mon update…Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5. Read affinity in stretch clusters Copia collegamentoCollegamento copiato negli appunti!
Read Affinity reduces cross-zone traffic by keeping the data access within the respective data centers.
For stretched clusters deployed in multi-zone environments, the read affinity topology implementation provides a mechanism to help keep traffic within the data center it originated from. Ceph Object Gateway volumes have the ability to read data from an OSD in proximity to the client, according to OSD locations defined in the CRUSH map and topology labels on nodes.
For example, a stretch cluster contains a Ceph Object Gateway primary OSD and replicated OSDs spread across two data centers A and B. If a GET action is performed on an Object in data center A, the READ operation is performed on the data of the OSDs closest to the client in data center A.
4.5.1. Performing localized reads Copia collegamentoCollegamento copiato negli appunti!
You can perform a localized read on a replicated pool in a stretch cluster. When a localized read request is made on a replicated pool, Ceph selects the local OSDs closest to the client based on the client location specified in crush_location.
Prerequisites
- A stretch cluster with two data centers and Ceph Object Gateway configured on both.
- A user created with a bucket having primary and replicated OSDs.
Procedure
To perform a localized read, set
rados_replica_read_policyto 'localize' in the OSD daemon configuration using theceph config setcommand.[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy localize
[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy localizeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verification: Perform the below steps to verify the localized read from an OSD set.
Run the
ceph osd treecommand to view the OSDs and the data centers.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
ceph orchcommand to identify the Ceph Object Gateway daemons in the data centers.Example
[ceph: root@host01 /]# ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4
[ceph: root@host01 /]# ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify if a default read has happened by running the
vimcommand on the Ceph Object Gateway logs.Example
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-26T08:07:45.471+0000 7fc623e63640 1 ====== starting new request req=0x7fc5b93694a0 ===== 2024-08-26T08:07:45.471+0000 7fc623e63640 1 -- 10.0.67.142:0/279982082 --> [v2:10.0.66.23:6816/73244434,v1:10.0.66.23:6817/73244434] -- osd_op(unknown.0.0:9081 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+localize_reads+known_if_redirected+supports_pool_eio e3533) -- 0x55f781bd2000 con 0x55f77f0e8c00
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-26T08:07:45.471+0000 7fc623e63640 1 ====== starting new request req=0x7fc5b93694a0 ===== 2024-08-26T08:07:45.471+0000 7fc623e63640 1 -- 10.0.67.142:0/279982082 --> [v2:10.0.66.23:6816/73244434,v1:10.0.66.23:6817/73244434] -- osd_op(unknown.0.0:9081 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+localize_reads+known_if_redirected+supports_pool_eio e3533) -- 0x55f781bd2000 con 0x55f77f0e8c00Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can see in the logs that a localized read has taken place.
ImportantTo be able to view the debug logs, you must first enable
debug_ms 1in the configuration by running theceph config setcommand.[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1
[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5.2. Performing balanced reads Copia collegamentoCollegamento copiato negli appunti!
You can perform a balanced read on a pool to retrieve evenly distributed OSDs across data centers. When a balanced READ is issued on a pool, the read operations are distributed evenly across all OSDs that are spread across the data centers.
Prerequisites
- A stretch cluster with two data centers and Ceph Object Gateway configured on both.
- A user created with a bucket and OSDs - primary and replicated OSDs.
Procedure
To perform a balanced read, set
rados_replica_read_policyto 'balance' in the OSD daemon configuration using theceph config setcommand.[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy balance
[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy balanceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verification: Perform the below steps to verify the balance read from an OSD set.
Run the
ceph osd treecommand to view the OSDs and the data centers.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
ceph orchcommand to identify the Ceph Object Gateway daemons in the data centers.Example
[ceph: root@host01 /]# ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4
[ceph: root@host01 /]# ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify if a balanced read has happened by running the
vimcommand on the Ceph Object Gateway logs.Example
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-27T09:32:25.510+0000 7f2a7a284640 1 ====== starting new request req=0x7f2a31fcf4a0 ===== 2024-08-27T09:32:25.510+0000 7f2a7a284640 1 -- 10.0.67.142:0/3116867178 --> [v2:10.0.64.146:6816/2838383288,v1:10.0.64.146:6817/2838383288] -- osd_op(unknown.0.0:268731 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e3554) -- 0x55cd1b88dc00 con 0x55cd18dd6000
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-27T09:32:25.510+0000 7f2a7a284640 1 ====== starting new request req=0x7f2a31fcf4a0 ===== 2024-08-27T09:32:25.510+0000 7f2a7a284640 1 -- 10.0.67.142:0/3116867178 --> [v2:10.0.64.146:6816/2838383288,v1:10.0.64.146:6817/2838383288] -- osd_op(unknown.0.0:268731 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e3554) -- 0x55cd1b88dc00 con 0x55cd18dd6000Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can see in the logs that a balanced read has taken place.
ImportantTo be able to view the debug logs, you must first enable
debug_ms 1in the configuration by running theceph config setcommand.[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1
[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5.3. Performing default reads Copia collegamentoCollegamento copiato negli appunti!
You can perform a default read on a pool to retrieve data from primary data centers. When a default READ is issued on a pool, the IO operations are retrieved directly from each OSD in the data center.
Prerequisites
- A stretch cluster with two data centers and Ceph Object Gateway configured on both.
- A user created with a bucket and OSDs - primary and replicated OSDs.
Procedure
To perform a default read, set
rados_replica_read_policyto 'default' in the OSD daemon configuration by using theceph config setcommand.Example
[ceph: root@host01 /]#ceph config set client.rgw.rgw.1 advanced rados_replica_read_policy default
[ceph: root@host01 /]#ceph config set client.rgw.rgw.1 advanced rados_replica_read_policy defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow The IO operations from the closest OSD in a data center are retrieved when a GET operation is performed.
Verification: Perform the below steps to verify the localized read from an OSD set.
Run the
ceph osd treecommand to view the OSDs and the data centers.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
ceph orchcommand to identify the Ceph Object Gateway daemons in the data centers.Example
ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4
ceph orch ps | grep rg rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex ceph-ci-fbv67y-ammmck-node4 *:80 running (4h) 10m ago 22h 93.3M - 19.1.0-55.el9cp 0ee0a0ad94c7 34f27723ccd2 rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp ceph-ci-fbv67y-ammmck-node7 *:80 running (4h) 10m ago 22h 96.4M - 19.1.0-55.el9cp 0ee0a0ad94c7 40e4f2a6d4c4Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify if a default read has happened by running the vim command on the Ceph Object Gateway logs.
Example
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-28T10:26:05.155+0000 7fe6b03dd640 1 ====== starting new request req=0x7fe6879674a0 ===== 2024-08-28T10:26:05.156+0000 7fe6b03dd640 1 -- 10.0.64.251:0/2235882725 --> [v2:10.0.65.171:6800/4255735352,v1:10.0.65.171:6801/4255735352] -- osd_op(unknown.0.0:1123 11.6d 11:b69767fc:::699c2d80-5683-43c5-bdcd-e8912107c176.24827.3_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e4513) -- 0x5639da653800 con 0x5639d804d800
[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log 2024-08-28T10:26:05.155+0000 7fe6b03dd640 1 ====== starting new request req=0x7fe6879674a0 ===== 2024-08-28T10:26:05.156+0000 7fe6b03dd640 1 -- 10.0.64.251:0/2235882725 --> [v2:10.0.65.171:6800/4255735352,v1:10.0.65.171:6801/4255735352] -- osd_op(unknown.0.0:1123 11.6d 11:b69767fc:::699c2d80-5683-43c5-bdcd-e8912107c176.24827.3_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e4513) -- 0x5639da653800 con 0x5639d804d800Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can see in the logs that a default read has taken place.
ImportantTo be able to view the debug logs, you must first enable
debug_ms 1in the configuration by running theceph config setcommand.[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1
[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx advanced debug_ms 1/1 [ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq advanced debug_ms 1/1Copy to Clipboard Copied! Toggle word wrap Toggle overflow