主页
产品
Red Hat Ceph Storage
9
Administration Guide
Chapter 4. Stretch clusters for Ceph storage

此内容没有您所选择的语言版本。

Chapter 4. Stretch clusters for Ceph storage

As a storage administrator, you can configure a two-site stretched cluster by enabling stretch mode in Ceph.

Red Hat Ceph Storage systems offer the option to expand the failure domain beyond the OSD level to a datacenter or cloud zone level.

The following diagram depicts a simplified representation of a Ceph cluster operating in stretch mode, where the tiebreaker host is provisioned in data center (DC) 3.

Figure 4.1. Stretch clusters for Ceph storage

A stretch cluster operates over a Wide Area Network (WAN), unlike a typical Ceph cluster, which operates over a Local Area Network (LAN). For illustration purposes, a data center is chosen as the failure domain, though this could also represent a cloud availability zone. Data Center 1 (DC1) and Data Center 2 (DC2) contain OSDs and Monitors within their respective domains, while Data Center 3 (DC3) contains only a single monitor. The latency between DC1 and DC2 should not exceed 10 ms RTT, as higher latency can significantly impact Ceph performance in terms of replication, recovery, and related operations. However, DC3—a non-data site typically hosted on a virtual machine—can tolerate higher latency compared to the two data sites. A stretch cluster, like the one in the diagram, can withstand a complete data center failure or a network partition between data centers as long as at least two sites remain connected.

A stretch cluster, like the one in the diagram, can withstand a complete data center failure or a network partition between data centers as long as at least two sites remain connected.

Note

There are no additional steps to power down a stretch cluster. You can see the Powering down and rebooting Red Hat Ceph Storage cluster for more information.

4.1. Stretch mode for a storage cluster
复制链接

To improve availability in Stretched clusters (geographically distributed deployments), you must enter the stretch mode. When stretch mode is enabled, the Ceph OSDs only take placement groups (PGs) as active when they peer across data centers, or whichever other CRUSH bucket type you specified, assuming both are active. Pools increase in size from the default three to four, with two copies on each site.

In stretch mode, Ceph OSDs are only allowed to connect to monitors within the same data center. New monitors are not allowed to join the cluster without specified location.

If all the OSDs and monitors from a data center become inaccessible at once, the surviving data center will enter a degraded stretch mode. This issues a warning, reduces the min_size to 1, and allows the cluster to reach an active state with the data from the remaining site.

Stretch mode is designed to handle netsplit scenarios between two data centers and the loss of one data center. Stretch mode handles the netsplit scenario by choosing the surviving data center with a better connection to the tiebreaker monitor. Stretch mode handles the loss of one data center by reducing the min_size of all pools to 1, allowing the cluster to continue operating with the remaining data center. When the lost data center comes back, the cluster will recover the lost data and return to normal operation.

Note

In a stretch cluster, when a site goes down and the cluster enters a degraded state, the min_size of the pool may be temporarily reduced (e.g., to 1) to allow the placement groups (PGs) to become active and continue serving I/O. However, the size of the pool remains unchanged. The peering_crush_bucket_count stretch mode flag ensures that PGs does not become active unless they are backed by OSDs in a minimum number of distinct CRUSH buckets (e.g., different data centers). This mechanism prevents the system from creating redundant copies solely within the surviving site, ensuring that data is only fully replicated once the downed site recovers.

When the missing data center becomes accessible again, the cluster enters recovery stretch mode. This changes the warning and allows peering, but still requires only the OSDs from the data center, which was up the whole time.

When all PGs are in a known state and are not degraded or incomplete, the cluster goes back to the regular stretch mode, ends the warning, and restores min_size to its starting value 2. The cluster again requires both sites to peer, not only the site that stayed up the whole time, therefore you can fail over to the other site, if necessary.

Stretch mode limitations

It is not possible to exit from stretch mode once it is entered.
You cannot use erasure-coded pools with clusters in stretch mode. You can neither enter the stretch mode with erasure-coded pools, nor create an erasure-coded pool when the stretch mode is active.
Device class is not supported in stretch mode. In the following example, the class hdd is not supported.
Example
```
rule stretch_replicated_rule
{id 2
type replicated class hdd
step take default
step choose firstn 0 type datacenter
step chooseleaf firstn 2 type host
step emit
}
```
To achieve same weights on both sites, the Ceph OSDs deployed in the two sites should be of equal size, that is, storage capacity in the first site is equivalent to storage capacity in the second site.
While it is not enforced, you should run two Ceph monitors on each site and a tiebreaker, for a total of five. This is because OSDs can only connect to monitors in their own site when in stretch mode.
You have to create your own CRUSH rule, which provides two copies on each site, which totals to four on both sites.
You cannot enable stretch mode if you have existing pools with non-default size or min_size.
Because the cluster runs with min_size 1 when degraded, you should only use stretch mode with all-flash OSDs. This minimizes the time needed to recover once connectivity is restored, and minimizes the potential for data loss.

Stretch peering rule

In Ceph stretch cluster mode, a critical safeguard is enforced through the stretch peering rule, which ensures that a Placement Group (PG) cannot become active if all acting replicas reside within a single failure domain, such as a single data center or cloud availability zone.

This behavior is essential for protecting data integrity during site failures. If a PG were allowed to go active with all replicas confined to one site, write operations could be falsely acknowledged without true redundancy. In the event of a site outage, this would result in complete data loss for those PGs. By enforcing zone diversity in the acting set, Ceph stretch clusters maintain high availability while minimizing the risk of data inconsistency or loss.

4.2. Deployment requirements
复制链接

This information details important hardware, software, and network requirements that are needed for deploying a generalized stretch cluster configuration for three availability zones.

Software requirements

Red Hat Ceph Storage 8.1

Hardware requirements

Use the following minimum requirements before a stretch cluster configuration.

Expand

Table 4.1. ceph-osd hardware requirements
Hardware criteria	Minimum and recommended
Processor	1 core minimum, 2 recommended. 1 core per 200-500 MB/s throughput. 1 core per 1000-3000 IOPS. Results are before replication. Results can vary across CPU and drive models and Ceph configuration (erasure coding, and compression). ARM processors specifically can require more cores for performance. SSD OSDs, especially NVMe, benefit from extra cores per OSD. Actual performance depends on various factors, including drives, network, and client throughput and latency. Bench marking is recommended to assess performance accurately.
RAM	4 GB or more per daemon is required (higher is recommended). 2-4 GB can work, but performance might be slower. Less than 2 GB is not recommended for optimal performance.
Network	A single 1 Gb/s (bonded 10+ Gb/s recommended).

Expand

Table 4.2. ceph-mon hardware requirements
Hardware criteria	Minimum and recommended
Processor	2 cores minimum
Storage drives	100 GB per daemon. SSD is recommended.
Network	A single 1 Gb/s (10+ Gb/s recommended)

Expand

Table 4.3. ceph-mds hardware requirements
Hardware criteria	Minimum and recommended
Processor	2 cores minimum
RAM	2 GB per daemon (more for production)
Disk space	1 GB per daemon
Network	A single 1 Gb/s (10+ Gb/s recommended)

Daemon placement

The following table lists the daemon placement details across various hosts and data centers.

Expand

Table 4.4. Daemon placement
Hostname	Data center	Services
host01	DC1	OSD+MON+MGR
host02	DC1	OSD+MON+MGR
host03	DC1	OSD+MDS+RGW
host04	DC2	OSD+MON+MGR
host05	DC2	OSD+MON+MGR
host06	DC2	OSD+MDS+RGW
host07	DC3 (Tiebreaker)	MON

Network configuration requirements

Use the following network configuration requirements before deploying stretch cluster configuration.

Note

You can use different subnets for each of the data centers.

Have two separate networks, one public network and one cluster network.
The latencies between data centers that run the Ceph Object Storage Devices (OSDs) cannot exceed 10 ms RTT.

The following is an example of a basic network configuration:

DC1
Ceph public/private network: 10.0.40.0/24
DC2
Ceph public/private network: 10.0.40.0/24
Tiebreaker
Ceph public/private network: 10.0.40.0/24

Cluster setup requirements

Ensure that the hostname is configured by using the bare or short hostname in all hosts.

Syntax

hostnamectl set-hostname SHORT_NAME

Important

The hostname command should only return the short hostname, when run on all nodes. If the FQDN is returned, the cluster configuration will not be successful.

4.3. Setting the CRUSH location for the daemons
复制链接

Before you enter the stretch mode, you need to prepare the cluster by setting the CRUSH location to the daemons in the Red Hat Ceph Storage cluster. There are two ways to do this:

Bootstrap the cluster through a service configuration file, where the locations are added to the hosts as part of deployment.
Set the locations manually through ceph osd crush add-bucket and ceph osd crush move commands after the cluster is deployed.

Method 1: Bootstrapping the cluster

Prerequisites

Root-level access to the nodes.

Procedure

If you are bootstrapping your new storage cluster, you can create the service configuration .yaml file that adds the nodes to the Red Hat Ceph Storage cluster and also sets specific labels for where the services should run:

Example

service_type: host
addr: host01
hostname: host01
location:
  root: default
  datacenter: DC1
labels:
  - osd
  - mon
  - mgr
---
service_type: host
addr: host02
hostname: host02
location:
  datacenter: DC1
labels:
  - osd
  - mon
---
service_type: host
addr: host03
hostname: host03
location:
  datacenter: DC1
labels:
  - osd
  - mds
  - rgw
---
service_type: host
addr: host04
hostname: host04
location:
  root: default
  datacenter: DC2
labels:
  - osd
  - mon
  - mgr
---
service_type: host
addr: host05
hostname: host05
location:
  datacenter: DC2
labels:
  - osd
  - mon
---
service_type: host
addr: host06
hostname: host06
location:
  datacenter: DC2
labels:
  - osd
  - mds
  - rgw
---
service_type: host
addr: host07
hostname: host07
labels:
  - mon
---
service_type: mon
placement:
  label: "mon"
---
service_id: cephfs
placement:
  label: "mds"
---
service_type: mgr
service_name: mgr
placement:
  label: "mgr"
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  label: "osd"
spec:
  data_devices:
    all: true
---
service_type: rgw
service_id: objectgw
service_name: rgw.objectgw
placement:
  count: 2
  label: "rgw"
spec:
  rgw_frontend_port: 8080

Bootstrap the storage cluster with the --apply-spec option:

Syntax

cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORD

Example

[root@host01 ~]# cephadm bootstrap --apply-spec initial-config.yaml --mon-ip 10.10.128.68 --ssh-private-key /home/ceph/.ssh/id_rsa --ssh-public-key /home/ceph/.ssh/id_rsa.pub --registry-url registry.redhat.io --registry-username myuser1 --registry-password mypassword1

Important

You can use different command options with the cephadm bootstrap command. However, always include the --apply-spec option to use the service configuration file and configure the host locations.

Method 2: Setting the locations after the deployment

Prerequisites

Root-level access to the nodes.

Procedure

Add two buckets to which you plan to set the location of your non-tiebreaker monitors to the CRUSH map, specifying the bucket type as as datacenter:

Syntax

ceph osd crush add-bucket BUCKET_NAME BUCKET_TYPE

Example

[ceph: root@host01 /]# ceph osd crush add-bucket DC1 datacenter
[ceph: root@host01 /]# ceph osd crush add-bucket DC2 datacenter

Move the buckets under root=default:

Syntax

ceph osd crush move BUCKET_NAME root=default

Example

[ceph: root@host01 /]# ceph osd crush move DC1 root=default
[ceph: root@host01 /]# ceph osd crush move DC2 root=default

Move the OSD hosts according to the required CRUSH placement:

Syntax

ceph osd crush move HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph osd crush move host01 datacenter=DC1

4.3.1. Setting the CRUSH location during bootstrap
复制链接

For hosts with multiple public networks, update the public networks in CIDR format, within the configuration file. Enhance the specification file by setting the monitor CRUSH location. Set this information during the bootstrap procedure.

For more information about Ceph bootstrapping and different cephadm bootstrap ommand options, see Bootstrapping a new storage cluster.

Prerequisites

Before you begin, be sure that you have root-level access to the nodes.

Procedure

Create a cluster-spec.yaml file. The specification file adds the nodes to the Red Hat Ceph Storage cluster and also sets specific labels for where the services run.

Example

service_type: host
addr: <host03 address>
hostname: host03
location:
 datacenter: DC1
labels:
 - osd
 - mds
 - rgw
---
service_type: host
addr: <host04 address>
hostname: host04
location:
 root: default
 datacenter: DC2
labels:
 - osd
 - mon
 - mgr
---
service_type: host
addr: host05
hostname: host05
location:
 datacenter: DC2
labels:
 - osd
 - mon
 - mgr
---
service_type: host
addr: <host06 address>
hostname: host06
location:
 datacenter: DC2
labels:
 - osd
 - mds
 - rgw
---
service_type: host
addr: <host07 address>
hostname: host07
labels:
 - mon
------
service_type: mon
spec:
 crush_locations:
   host01:
     - datacenter=DC1
   host02:
     - datacenter=DC1
   host04:
     - datacenter=DC2
   host05:
     - datacenter=DC2
placement:
  label: mon
---
service_type: mgr
service_name: mgr
placement:
 label: "mgr"
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
 label: "osd"
spec:
 data_devices:
   all: true
---
service_type: rgw
service_id: objectgw
service_name: rgw.objectgw
placement:
 count: 2
 label: "rgw"
spec:
 rgw_frontend_port: 8080

Run the cephadm bootstrap command as the root user on the node that will serve as the initial Monitor node in the cluster.

The MONITOR_IP_ADDRESS value is the node’s IP address that you are using to run the cephadm bootstrap command. Run one of the following commands, based on your configuration needs.

For hosts that are present on the same network.

Syntax

cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORD

For hosts that are present on different networks.

Syntax

cephadm bootstrap --apply-spec CONFIGURATION_FILE_NAME --mon-ip MONITOR_IP_ADDRESS --ssh-private-key PRIVATE_KEY --ssh-public-key PUBLIC_KEY --registry-url REGISTRY_URL --registry-username USER_NAME --registry-password PASSWORD --config ceph.conf

The following configuration in the ceph.conf file defines the public network settings for the cluster:

[global]
public_network = 10.1.172.0/23,10.0.64.0/22

After the bootstrap process completes, the following output is emitted.

Or, if you are only running a single cluster on this host:
    sudo /usr/sbin/cephadm shell
Please consider enabling telemetry to help improve Ceph:
    ceph telemetry on
For more information see:
    https://docs.ceph.com/en/latest/mgr/telemetry/
Bootstrap complete.

Verification

Verify the network configuration.
```
ceph config dump | grep network
```
In the following example, the output confirms that there are multiple networks and that 10.1.172.0/23 and 10.0.64.0/22 are properly configured.
Example
```
[ceph: root@host01 /]#  ceph config dump | grep network
global   advanced   public_network   10.1.172.0/23,10.0.64.0/22   *
```

Verify that the status of storage cluster deployment is in the HEALTH_OK state.

Example

[ceph: root@host01 /]# ceph -s

 cluster:
   id:     ff19789c-f5c7-11ef-8e1c-fa163e4e1f7e
   health: HEALTH_OK

 services:
   mon: 5 daemons, quorum host01,host05,host02,host07-installer,host04 (age 10m)
   mgr: host05.aswlzq(active, since 43m), standbys: host02.ctajlt, host01.napqyw, host04.wdglem
   mds: 1/1 daemons up, 1 standby
   osd: 24 osds: 24 up (since 31m), 24 in (since 32m)
   rgw: 4 daemons active (2 hosts, 1 zones)

 data:
   volumes: 1/1 healthy
   pools:   7 pools, 1019 pgs
   objects: 216 objects, 456 KiB
   usage:   2.7 GiB used, 357 GiB / 360 GiB avail

Verify that all the nodes that were added in the cluster-spec.yaml file are added to the cluster.

ceph orch host ls

Example

[ceph: root@host01 /]# ceph orch host ls
HOST                                       ADDR         LABELS       STATUS
host01            10.0.56.37   mgr,mon,osd
host02            10.0.59.35   mgr,mon,osd
host03            10.0.58.106  osd,mds,rgw
host04            10.0.56.13   osd,mon,mgr
host05            10.0.59.188  mgr,mon,osd
host06            10.0.56.223  rgw,mds,osd
host07            10.0.56.189  _admin,mon
7 hosts in cluster

Use the ceph osd tree command to verify the following:

CRUSH locations for the OSD hosts
That each host has one OSD configured
That each host OSD is in the up state

That each node is in the correct data center bucket

Example

[ceph: root@host01 /]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME               STATUS REWEIGHT PRI-AFF
-1          0.35010  root default
-3          0.17505      datacenter DC1
-2          0.05835        host host01
5    hdd    0.01459               osd.5      up     1.00000  1.00000
11   hdd    0.01459              osd.11      up     1.00000  1.00000
17   hdd    0.01459              osd.17      up     1.00000  1.00000
23   hdd    0.01459              osd.23      up     1.00000  1.00000
-4          0.05835      host host02
1    hdd    0.01459               osd.1      up     1.00000  1.00000
6    hdd    0.01459               osd.6      up     1.00000  1.00000
12   hdd    0.01459              osd.12      up     1.00000  1.00000
18   hdd    0.01459              osd.18      up     1.00000  1.00000
-5          0.05835      host host03
3    hdd    0.01459               osd.3      up     1.00000  1.00000
10   hdd    0.01459              osd.10      up     1.00000  1.00000
16   hdd    0.01459              osd.16      up     1.00000  1.00000
22   hdd    0.01459              osd.22      up     1.00000  1.00000
-7          0.17505      datacenter DC2
-6          0.05835      host host04
2    hdd    0.01459               osd.2      up     1.00000  1.00000
8    hdd    0.01459               osd.8      up     1.00000  1.00000
14   hdd    0.01459              osd.14      up     1.00000  1.00000
20   hdd    0.01459              osd.20      up     1.00000  1.00000
-8          0.05835      host host05
0    hdd    0.01459               osd.0      up     1.00000  1.00000
7    hdd    0.01459               osd.7      up     1.00000  1.00000
13   hdd    0.01459              osd.13      up     1.00000  1.00000
19   hdd    0.01459              osd.19      up     1.00000  1.00000
-9          0.05835      host host06
4    hdd    0.01459               osd.4      up     1.00000  1.00000
9    hdd    0.01459               osd.9      up     1.00000  1.00000
15   hdd    0.01459              osd.15      up     1.00000  1.00000
21   hdd    0.01459              osd.21      up     1.00000  1.00000

Verify the CRUSH locations for the MON hosts. Check the mon map to ensure that each MON host has a crush_location specified.

ceph mon dump

The output displays details about the MON map, including the crush_location for each host.

Example

[ceph: root@host01 /]# ceph mon dump

epoch 5

fsid 4158287e-169e-11f0-b1ad-fa163e98b991

last_changed 2025-04-11T06:32:20.332479+0000

created 2025-04-11T06:29:24.974553+0000

min_mon_release 19 (squid)

election_strategy: 1

0: [v2:10.0.57.33:3300/0,v1:10.0.57.33:6789/0] mon.host07

1: [v2:10.0.58.200:3300/0,v1:10.0.58.200:6789/0] mon.host05; crush_location {datacenter=DC2}

2: [v2:10.0.58.47:3300/0,v1:10.0.58.47:6789/0] mon.host02; crush_location {datacenter=DC1}

3: [v2:10.0.58.104:3300/0,v1:10.0.58.104:6789/0] mon.host04; crush_location {datacenter=DC2}

4: [v2:10.0.58.38:3300/0,v1:10.0.58.38:6789/0] mon.host01; crush_location {datacenter=DC1}

4.3.2. Manually setting the CRUSH location for daemons
复制链接

Set the locations manually through ceph osd crush add-bucket and ceph osd crush move commands after the cluster is deployed.

Prerequisites

Before you begin, be sure that you have root-level access to the nodes.

Procedure

Add two buckets to which you plan to set the location of your non-tiebreaker monitors to the CRUSH map. Specify the bucket type as datacenter.

Syntax

ceph osd crush add-bucket BUCKET_NAME datacenter

Example

[ceph: root@host01 /]# ceph osd crush add-bucket DC1 datacenter
[ceph: root@host01 /]# ceph osd crush add-bucket DC2 datacenter

Move each of the buckets to root=default.

Syntax

ceph osd crush move BUCKET_NAME root=default

Example

[ceph: root@host01 /]# ceph osd crush move DC1 root=default
[ceph: root@host01 /]# ceph osd crush move DC2 root=default

Move the OSD hosts, according to the required CRUSH placement.

Syntax

ceph osd crush move HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph osd crush move host01 datacenter=DC1

Verify the CRUSH locations for OSD hosts.

Example

[ceph: root@host01 /]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME               STATUS REWEIGHT PRI-AFF
-1          0.35010  root default
-3          0.17505      datacenter DC1
-2          0.05835        host host01
5    hdd    0.01459               osd.5      up     1.00000  1.00000
11   hdd    0.01459              osd.11      up     1.00000  1.00000
17   hdd    0.01459              osd.17      up     1.00000  1.00000
23   hdd    0.01459              osd.23      up     1.00000  1.00000
-4          0.05835        host host02
1    hdd    0.01459               osd.1      up     1.00000  1.00000
6    hdd    0.01459               osd.6      up     1.00000  1.00000
12   hdd    0.01459              osd.12      up     1.00000  1.00000
18   hdd    0.01459              osd.18      up     1.00000  1.00000
-5          0.05835        host host03
3    hdd    0.01459               osd.3      up     1.00000  1.00000
10   hdd    0.01459              osd.10      up     1.00000  1.00000
16   hdd    0.01459              osd.16      up     1.00000  1.00000
22   hdd    0.01459              osd.22      up     1.00000  1.00000
-7          0.17505      datacenter DC2
-6          0.05835         host host04
2    hdd    0.01459               osd.2      up     1.00000  1.00000
8    hdd    0.01459               osd.8      up     1.00000  1.00000
14   hdd    0.01459              osd.14      up     1.00000  1.00000
20   hdd    0.01459              osd.20      up     1.00000  1.00000
-8          0.05835         host host05
0    hdd    0.01459               osd.0      up     1.00000  1.00000
7    hdd    0.01459               osd.7      up     1.00000  1.00000
13   hdd    0.01459              osd.13      up     1.00000  1.00000
19   hdd    0.01459              osd.19      up     1.00000  1.00000
-9          0.05835         host host06
4    hdd    0.01459               osd.4      up     1.00000  1.00000
9    hdd    0.01459               osd.9      up     1.00000  1.00000
15   hdd    0.01459              osd.15      up     1.00000  1.00000
21   hdd    0.01459              osd.21      up     1.00000  1.00000

Set the location of each monitor, matching your CRUSH map.

Syntax

ceph mon set_location HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon set_location host01 datacenter=DC1
[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC1
[ceph: root@host01 /]# ceph mon set_location host04 datacenter=DC2
[ceph: root@host01 /]# ceph mon set_location host05 datacenter=DC2

Verify the CRUSH locations for the MON hosts. Check the mon map to ensure that each MON host has a crush_location specified.

Syntax

ceph mon dump

The output displays details about the MON map, including the crush_location for each host.

Example

[ceph: root@host01 /]# ceph mon dump

epoch 5

fsid 4158287e-169e-11f0-b1ad-fa163e98b991

last_changed 2025-04-11T06:32:20.332479+0000

created 2025-04-11T06:29:24.974553+0000

min_mon_release 19 (squid)

election_strategy: 1

0: [v2:10.0.57.33:3300/0,v1:10.0.57.33:6789/0] mon.host07

1: [v2:10.0.58.200:3300/0,v1:10.0.58.200:6789/0] mon.host05; crush_location {datacenter=DC2}

2: [v2:10.0.58.47:3300/0,v1:10.0.58.47:6789/0] mon.host02; crush_location {datacenter=DC1}

3: [v2:10.0.58.104:3300/0,v1:10.0.58.104:6789/0] mon.host04; crush_location {datacenter=DC2}

4: [v2:10.0.58.38:3300/0,v1:10.0.58.38:6789/0] mon.host01; crush_location {datacenter=DC1}

4.4. Configuring a CRUSH map for stretch mode
复制链接

Use this information to configure a CRUSH map for stretch mode.

Prerequisites

Before you begin, make sure that you have the following prerequisites in place:

Root-level access to the nodes.
The CRUSH location is set to the hosts.

Procedure

Create a CRUSH rule that makes use of this OSD crush topology by installing the ceph-base RPM package in order to use the crushtool command.
Syntax
```
dnf -y install ceph-base
```
Get the compiled CRUSH map from the cluster.
Syntax
```
ceph osd getcrushmap > /etc/ceph/crushmap.bin
```
Decompile the CRUSH map and convert it to a text file to edit it.
Syntax
```
crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txt
```

Add the following rule to the CRUSH map by editing the /etc/ceph/crushmap.txt at the end of the file. This rule distributes reads and writes evenly across the data center.

Syntax

rule stretch_rule {
        id 1
        type replicated
        step take default
        step choose firstn 0 type datacenter
        step chooseleaf firstn 2 type host
        step emit
 }

Optionally have the cluster with a read/write affinity towards data center 1.

Syntax

rule stretch_rule {
         id 1
         type replicated
         step take DC1
         step chooseleaf firstn 2 type host
         step emit
         step take DC2
         step chooseleaf firstn 2 type host
         step emit
 }

The CRUSH rule declared contains the following information:
     Rule name
          Description: A unique name for identifying the rule.
          Value: stretch_rule
     id
          Description: A unique whole number for identifying the rule.
          Value: 1
     type
          Description: Describes a rule for either a storage drive replicated or erasure-coded.
          Value: replicated
     step take default
          Description: Takes the root bucket called default, and begins iterating down the tree.
     step take DC1
          Description: Takes the bucket called DC1, and begins iterating down the tree.
     step choose firstn 0 type datacenter
          Description: Selects the datacenter bucket, and goes into its subtrees.
     step chooseleaf firstn 2 type host
          Description: Selects the number of buckets of the given type. In this case, it is two different hosts located in the datacenter it entered at the previous level.
     step emit
          Description: Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to pick from different trees in the same rule.

Compile the new CRUSH map from /etc/ceph/crushmap.txt and convert it to a binary file /etc/ceph/crushmap2.bin.

Syntax

crushtool -c /path/to/crushmap.txt -o /path/to/crushmap2.bin

Example

[ceph: root@host01 /]# crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.bin

Inject the newly created CRUSH map back into the cluster.
Syntax
```
ceph osd setcrushmap -i /path/to/compiled_crushmap
```
Example
```
[ceph: root@host01 /]# ceph osd setcrushmap -i /path/to/compiled_crushmap
17
```
Note
The number 17 is a counter and increases (18,19, and so on) depending on the changes that are made to the CRUSH map.

Verifying

Verify that the newly created stretch_rule available for use.

Syntax

ceph osd crush rule ls

Example

[ceph: root@host01 /]# ceph osd crush rule ls

replicated_rule
stretch_rule

4.4.1. Changing stretch mode
复制链接

Change the stretch mode state by entering or exiting stretch mode as needed to support your cluster’s availability and data‑placement requirements.

4.4.1.1. Entering stretch mode
复制链接

Stretch mode is designed to handle two sites. There is a lesser risk of component availability outages with 2-site clusters.

Prerequisites

Before you begin, make sure that you have the following prerequisites in place:

Root-level access to the nodes.
The CRUSH location is set to the hosts.
The CRUSH map configured to include stretch rule.
No erasure coded pools in the cluster.
Weights of the two sites are the same.

Procedure

Check the current election strategy being used by the monitors.

Syntax

ceph mon dump | grep election_strategy

Note

The Ceph cluster election_strategy is set to 1, by default.

Example

[ceph: root@host01 /]# ceph mon dump | grep election_strategy

dumped monmap epoch 9
election_strategy: 1

Change the election strategy to connectivity.
Syntax
```
ceph mon set election_strategy connectivity
```
For more information about configuring the election strategy, see Configuring monitor election strategy.

Use the ceph mon dump command to verify that the election strategy was updated to 3.

Example

[ceph: root@host01 /]# ceph mon dump | grep election_strategy

dumped monmap epoch 22
election_strategy: 3

Set the location of the tiebreaker monitor so that it is split across the data centers.

Syntax

ceph mon set_location TIEBREAKER_HOST datacenter=DC3

Example

[ceph: root@host01 /]# ceph mon set_location host07 datacenter=DC3

Verify that the tiebreaker monitor is set as expected.

Syntax

ceph mon dump

Example

[ceph: root@host01 /]# ceph mon dump

epoch 8

fsid 4158287e-169e-11f0-b1ad-fa163e98b991

last_changed 2025-04-11T07:14:48.652801+0000

created 2025-04-11T06:29:24.974553+0000

min_mon_release 19 (squid)

election_strategy: 3

0: [v2:10.0.57.33:3300/0,v1:10.0.57.33:6789/0] mon.host07; crush_location {datacenter=DC3}

1: [v2:10.0.58.200:3300/0,v1:10.0.58.200:6789/0] mon.host05; crush_location {datacenter=DC2}

2: [v2:10.0.58.47:3300/0,v1:10.0.58.47:6789/0] mon.host02; crush_location {datacenter=DC1}

3: [v2:10.0.58.104:3300/0,v1:10.0.58.104:6789/0] mon.host04; crush_location {datacenter=DC2}

4: [v2:10.0.58.38:3300/0,v1:10.0.58.38:6789/0] mon.host01; crush_location {datacenter=DC1}

dumped monmap epoch 8
0

Enter stretch mode.
Syntax
```
ceph mon enable_stretch_mode TIEBREAKER_HOST STRETCH_RULE STRETCH_BUCKET
```
In the following example:
- The tiebreaker node is set as host07.
- The stretch rule is stretch_rule, as created in .
- The stretch bucket is set as datacenter.

[ceph: root@host01 /]# ceph mon enable_stretch_mode host07 stretch_rule datacenter

Verifying

Verify that stretch mode was implemented correctly by continuing to Verifying stretch mode.

4.4.1.2. Exiting stretch mode
复制链接

Disable stretch mode by moving pools to a specified CRUSH rule or to the default replicated rule.

Procedure

Disable stretch mode. You can specify a CRUSH rule to move all pools to. If you do not specify a rule, Ceph moves the pools to the default replicated CRUSH rule.
Syntax
```
ceph mon disable_stretch_mode CRUSH_RULE --yes-i-really-mean-it
```

4.4.2. Verifying stretch mode
复制链接

Use this information to verify that stretch mode was created correctly with the implemented CRUSH rules.

Procedure

Verify that all pools are using the CRUSH rule that was created in the Ceph cluster. In these examples, the CRUSH rule is set as stretch_rule, per the settings that were created in Configuring a CRUSH map for stretch mode.

Syntax

for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;done

Example

[ceph: root@host01 /]# for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;done
Pool: device_health_metrics; crush_rule: stretch_rule
Pool: cephfs.cephfs.meta; crush_rule: stretch_rule
Pool: cephfs.cephfs.data; crush_rule: stretch_rule
Pool: .rgw.root; crush_rule: stretch_rule
Pool: default.rgw.log; crush_rule: stretch_rule
Pool: default.rgw.control; crush_rule: stretch_rule
Pool: default.rgw.meta; crush_rule: stretch_rule
Pool: rbdpool; crush_rule: stretch_rule

Verify that stretch mode is enabled. Ensure that stretch_mode_enabled is set to true.
Syntax
```
ceph osd dump
```
The output includes the following information:
stretch_mode_enabled
Set to true if stretch mode is enabled.
stretch_bucket_count
The number of data centers with OSDs.
degraded_stretch_mode
Output of 0 if not degraded. If the stretch mode is degraded, this outputs the number of up sites.
recovering_stretch_mode
Output of 0 if not recovering. If the stretch mode is recovering, the output is 1.
stretch_mode_bucket
A unique value set for each CRUSH bucket type. This value is usually set to 8, for data center.
Example

"stretch_mode": { "stretch_mode_enabled": true, "stretch_bucket_count": 2, "degraded_stretch_mode": 0, "recovering_stretch_mode": 1, "stretch_mode_bucket": 8

Verify that stretch mode is using the mon map, by using the ceph mon dump.

Ensure the following:

stretch_mode_enabled is set to 1
The correct mon host is set as tiebreaker_mon

The correct mon host is set as disallowed_leaders

Syntax

ceph mon dump

Example

[ceph: root@host01 /]# ceph mon dump
epoch 16
fsid ff19789c-f5c7-11ef-8e1c-fa163e4e1f7e
last_changed 2025-02-28T12:12:51.089706+0000
created 2025-02-28T11:34:59.325503+0000
min_mon_release 19 (squid)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host07
disallowed_leaders host07
0: [v2:10.0.56.37:3300/0,v1:10.0.56.37:6789/0] mon.host01; crush_location {datacenter=DC1}
1: [v2:10.0.59.188:3300/0,v1:10.0.59.188:6789/0] mon.host05; crush_location {datacenter=DC2}
2: [v2:10.0.59.35:3300/0,v1:10.0.59.35:6789/0] mon.host02; crush_location {datacenter=DC1}
3: [v2:10.0.56.189:3300/0,v1:10.0.56.189:6789/0] mon.host07; crush_location {datacenter=DC3}
4: [v2:10.0.56.13:3300/0,v1:10.0.56.13:6789/0] mon.host04; crush_location {datacenter=DC2}
dumped monmap epoch 16

What to do next

Deploy, configure, and administer a Ceph Object Gateway. For more information, see Ceph Object Gateway.
Manage, create, configure, and use Ceph Block Devices. For more information, see Ceph block devices.
Create, mount, and work the Ceph File System (CephFS). For more information, see Ceph File Systems.

4.5. Using and maintaining stretch mode
复制链接

Use and maintain stretch mode by adding OSD hosts, managing data center monitor service hosts, and replacing tiebreakers with a monitor both with and without a quorum.

4.5.1. Adding OSD hosts in stretch mode
复制链接

You can add Ceph OSDs in the stretch mode. The procedure is similar to the addition of the OSD hosts on a cluster where stretch mode is not enabled.

Prerequisites

A running Red Hat Ceph Storage cluster.
Stretch mode in enabled on a cluster.
Root-level access to the nodes.

Procedure

List the available devices to deploy OSDs:

Syntax

ceph orch device ls [--hostname=HOST_1 HOST_2] [--wide] [--refresh]

Example

[ceph: root@host01 /]# ceph orch device ls

Deploy the OSDs on specific hosts or on all the available devices:
- Create an OSD from a specific device on a specific host:
  Syntax
  ceph orch daemon add osd HOST:DEVICE_PATH
  Example
  [ceph: root@host01 /]# ceph orch daemon add osd host03:/dev/sdb
- Deploy OSDs on any available and unused devices:
  Important
  This command creates collocated WAL and DB devices. If you want to create non-collocated devices, do not use this command.
  Example
  [ceph: root@host01 /]# ceph orch apply osd --all-available-devices

Move the OSD hosts under the CRUSH bucket:

Syntax

ceph osd crush move HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph osd crush move host03 datacenter=DC1
[ceph: root@host01 /]# ceph osd crush move host06 datacenter=DC2

Note

Ensure you add the same topology nodes on both sites. Issues might arise if hosts are added only on one site.

4.5.2. Managing data center monitor service hosts in stretch mode
复制链接

Use this information to add and remove data center monitor service (mon) hosts in stretch mode. Managing data centers can be done by using the specification file or directly on the Ceph cluster.

Prerequisites

Before you begin, make sure that you have the following prerequisites in place:

A running Red Hat Ceph Storage cluster
Stretch mode in enabled on a cluster
Root-level access to the nodes.

4.5.2.1. Managing a mon service with a service specification file
复制链接

These steps detail how to add a mon service. To remove the service, use the same steps of updating the service specification file, with removing the needed information.

Procedure

Export the specification file for mon and save the output to mon-spec.yaml.
Syntax
```
ceph orch ls mon --export > mon-spec.yaml
```
After the file is exported, the YAML file can be edited.

Add the new host details. In the following example, host08 is being added to the cluster into the DC2 data center bucket.

Syntax

service_type: host
addr: 10.1.172.225
hostname: host08
labels:
- mon
---
service_type: mon
service_name: mon
placement:
 label: mon
spec:
 crush_locations:
   host01:
   - datacenter=DC1
  host02:
   - datacenter=DC1
  host03:
   - datacenter=DC1
   host04:
   - datacenter=DC2
   host05:
   - datacenter=DC2
   host06:
   - datacenter=DC2
   host08:
   - datacenter=DC2

Apply the specification file.

Syntax

ceph orch apply -i mon-spec.yaml

Example

[ceph: root@host01 /]# eph orch apply -i mon-spec.yaml
Added host 'host08' with addr '10.1.172.225'
Scheduled mon update...

Verifying

Use the ceph mon dump command to verify that the mon service was deployed and that the appropriate CRUSH location was added to the monitor.

Example

[ceph: root@host01 /]# ceph mon dump
epoch 16
fsid ff19789c-f5c7-11ef-8e1c-fa163e4e1f7e
last_changed 2025-02-28T12:12:51.089706+0000
created 2025-02-28T11:34:59.325503+0000
min_mon_release 19 (squid)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host07
disallowed_leaders host07
0: [v2:10.0.56.37:3300/0,v1:10.0.56.37:6789/0] mon.host01; crush_location {datacenter=DC1}
1: [v2:10.0.59.188:3300/0,v1:10.0.59.188:6789/0] mon.host05; crush_location {datacenter=DC2}
2: [v2:10.0.59.35:3300/0,v1:10.0.59.35:6789/0] mon.host02; crush_location {datacenter=DC1}
3: [v2:10.0.56.189:3300/0,v1:10.0.56.189:6789/0] mon.host07; crush_location {datacenter=DC3}
4: [v2:10.0.56.13:3300/0,v1:10.0.56.13:6789/0] mon.host04; crush_location {datacenter=DC2}
dumped monmap epoch 16

Use the ceph orch host ls to verify that the host was added to the cluster.

Example

[ceph: root@host01 /]# ceph orch host ls
HOST                                        ADDR         LABELS       STATUS
host01            10.0.56.37   mgr,mon,osd
host02            10.0.59.35   mgr,mon,osd
host03            10.0.58.106  osd,mds,rgw
host04            10.0.56.13   osd,mon,mgr
host05            10.0.59.188  mgr,mon,osd
host06            10.0.56.223  rgw,mds,osd
host07            10.0.56.189  _admin,mon
7 hosts in cluster

4.5.2.2. Managing a mon service with the command-line interface
复制链接

These steps detail how to add a mon service. To remove the service, use the same steps of updating with the CLI, with removing the needed information.

Procedure

Set the monitor service to unmanaged.
Syntax
```
ceph orch set-unmanaged mon
```

Optional: Use the ceph orch ls command to verify that the service was set, as expected.

Example

[ceph: root@host01 /]# ceph orch host ls
NAME                                 PORTS             RUNNING  REFRESHED  AGE  PLACEMENT
mon                                                        8/8  10m ago    19s  <unmanaged>

Add a new host with the mon label.

Syntax

ceph orch host add HOST_NAME IP_ADDRESS_OF_HOST [--label=LABEL_NAME_1,LABEL_NAME_2]

Example

[ceph: root@host01 /]# ceph orch host add host08 10.1.172.205 --labels=mon

Add a monitor service with CRUSH locations.
Note
At this point, the mon is not running and is not managed by Cephadm.
Syntax
```
ceph mon add NODE:_IP_ADDRESS_ datacenter=DC2
```
Example
```
[ceph: root@host01 /]# ceph mon add host08:10.1.172.205 datacenter=DC2
```

Deploy the monitor daemon using Cephadm.

Syntax

ceph orch daemon add mon host08

Example

[ceph: root@host01 /]# ceph orch daemon add mon host08
Deployed mon.host08 on host 'host08'

Enable Cephadm management for the monitor service.
Syntax
```
ceph orch set-managed mon
```
Start the newly added mon daemon.
Syntax
```
ceph orch set-managed mgr
```

Verifying

Verify that the service, monitor, and host are added and running.

Use the ceph orch ls command to verify that the service is running.

Example

[ceph: root@host01 /]# ceph orch host ls
NAME                                 PORTS             RUNNING  REFRESHED  AGE  PLACEMENT
mon                                                        8/8  7m ago     4d   label:mon

Use the ceph mon dump command to verify that the mon service was deployed and that the appropriate CRUSH location was added to the monitor.

Example

[ceph: root@host01 /]# ceph mon dump
epoch 16
fsid ff19789c-f5c7-11ef-8e1c-fa163e4e1f7e
last_changed 2025-02-28T12:12:51.089706+0000
created 2025-02-28T11:34:59.325503+0000
min_mon_release 19 (squid)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host07
disallowed_leaders host07
0: [v2:10.0.56.37:3300/0,v1:10.0.56.37:6789/0] mon.host01; crush_location {datacenter=DC1}
1: [v2:10.0.59.188:3300/0,v1:10.0.59.188:6789/0] mon.host05; crush_location {datacenter=DC2}
2: [v2:10.0.59.35:3300/0,v1:10.0.59.35:6789/0] mon.host02; crush_location {datacenter=DC1}
3: [v2:10.0.56.189:3300/0,v1:10.0.56.189:6789/0] mon.host07; crush_location {datacenter=DC3}
4: [v2:10.0.56.13:3300/0,v1:10.0.56.13:6789/0] mon.host04; crush_location {datacenter=DC2}
dumped monmap epoch 16

Use the ceph orch host ls commmand to verify that the host was added to the cluster.

Example

[ceph: root@host01 /]# ceph orch host ls
HOST                                        ADDR         LABELS       STATUS
host01            10.0.56.37   mgr,mon,osd
host02            10.0.59.35   mgr,mon,osd
host03            10.0.58.106  osd,mds,rgw
host04            10.0.56.13   osd,mon,mgr
host05            10.0.59.188  mgr,mon,osd
host06            10.0.56.223  rgw,mds,osd
host07            10.0.56.189  _admin,mon
7 hosts in cluster

4.5.3. Replacing the tiebreaker with a monitor in quorum
复制链接

If your tiebreaker monitor fails, you can replace it with an existing monitor in quorum and remove it from the cluster.

Prerequisites

A running Red Hat Ceph Storage cluster
Stretch mode is enabled on a cluster

Procedure

Disable automated monitor deployment:

Example

[ceph: root@host01 /]# ceph orch apply mon --unmanaged

Scheduled mon update…

View the monitors in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 5 daemons, quorum host01, host02, host04, host05 (age 30s), out of quorum: host07

Set the monitor in quorum as a new tiebreaker:

Syntax

ceph mon set_new_tiebreaker NEW_HOST

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02

Important

You get an error message if the monitor is in the same location as existing non-tiebreaker monitors:

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host02

Error EINVAL: mon.host02 has location DC1, which matches mons host02 on the datacenter dividing bucket for stretch mode.

If that happens, change the location of the monitor:

Syntax

ceph mon set_location HOST datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon set_location host02 datacenter=DC3

Remove the failed tiebreaker monitor:

Syntax

ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force

Example

[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force

Removed mon.host07 from host 'host07'

Once the monitor is removed from the host, redeploy the monitor:

Syntax

ceph mon add HOST IP_ADDRESS datacenter=DATACENTER
ceph orch daemon add mon HOST

Example

[ceph: root@host01 /]# ceph mon add host07 213.222.226.50 datacenter=DC1
[ceph: root@host01 /]# ceph orch daemon add mon host07

Ensure there are five monitors in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 5 daemons, quorum host01, host02, host04, host05, host07 (age 15s)

Verify that everything is configured properly:

Example

[ceph: root@host01 /]# ceph mon dump

epoch 19
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
last_changed 2023-01-17T04:12:05.709475+0000
created 2023-01-16T05:47:25.631684+0000
min_mon_release 16 (pacific)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host02
disallowed_leaders host02
0: [v2:132.224.169.63:3300/0,v1:132.224.169.63:6789/0] mon.host02; crush_location {datacenter=DC3}
1: [v2:220.141.179.34:3300/0,v1:220.141.179.34:6789/0] mon.host04; crush_location {datacenter=DC2}
2: [v2:40.90.220.224:3300/0,v1:40.90.220.224:6789/0] mon.host01; crush_location {datacenter=DC1}
3: [v2:60.140.141.144:3300/0,v1:60.140.141.144:6789/0] mon.host07; crush_location {datacenter=DC1}
4: [v2:186.184.61.92:3300/0,v1:186.184.61.92:6789/0] mon.host03; crush_location {datacenter=DC2}
dumped monmap epoch 19

Redeploy the monitors:

Syntax

ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host07"

Scheduled mon update...

4.5.4. Replacing the tiebreaker with a new monitor
复制链接

If your tiebreaker monitor fails, you can replace it with a new monitor and remove it from the cluster.

Prerequisites

Before you begin, make sure that you have the following prerequisites in place:

A running Red Hat Ceph Storage cluster
Stretch mode in enabled on a cluster

Procedure

Add a new monitor to the cluster:

Manually add the crush_location to the new monitor:

Syntax

ceph mon add NEW_HOST IP_ADDRESS datacenter=DATACENTER

Example

[ceph: root@host01 /]# ceph mon add host06 213.222.226.50 datacenter=DC3

adding mon.host06 at [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0]

Note

The new monitor has to be in a different location than existing non-tiebreaker monitors.

Disable automated monitor deployment:

Example

[ceph: root@host01 /]# ceph orch apply mon --unmanaged

Scheduled mon update…

Deploy the new monitor:

Syntax

ceph orch daemon add mon NEW_HOST

Example

[ceph: root@host01 /]# ceph orch daemon add mon host06

Ensure there are 6 monitors, from which 5 are in quorum:

Example

[ceph: root@host01 /]# ceph -s

mon: 6 daemons, quorum host01, host02, host04, host05, host06 (age 30s), out of quorum: host07

Set the new monitor as a new tiebreaker:

Syntax

ceph mon set_new_tiebreaker NEW_HOST

Example

[ceph: root@host01 /]# ceph mon set_new_tiebreaker host06

Remove the failed tiebreaker monitor:

Syntax

ceph orch daemon rm FAILED_TIEBREAKER_MONITOR --force

Example

[ceph: root@host01 /]# ceph orch daemon rm mon.host07 --force

Removed mon.host07 from host 'host07'

Verify that everything is configured properly:

Example

[ceph: root@host01 /]# ceph mon dump

epoch 19
fsid 1234ab78-1234-11ed-b1b1-de456ef0a89d
last_changed 2023-01-17T04:12:05.709475+0000
created 2023-01-16T05:47:25.631684+0000
min_mon_release 16 (pacific)
election_strategy: 3
stretch_mode_enabled 1
tiebreaker_mon host06
disallowed_leaders host06
0: [v2:213.222.226.50:3300/0,v1:213.222.226.50:6789/0] mon.host06; crush_location {datacenter=DC3}
1: [v2:220.141.179.34:3300/0,v1:220.141.179.34:6789/0] mon.host04; crush_location {datacenter=DC2}
2: [v2:40.90.220.224:3300/0,v1:40.90.220.224:6789/0] mon.host01; crush_location {datacenter=DC1}
3: [v2:60.140.141.144:3300/0,v1:60.140.141.144:6789/0] mon.host02; crush_location {datacenter=DC1}
4: [v2:186.184.61.92:3300/0,v1:186.184.61.92:6789/0] mon.host05; crush_location {datacenter=DC2}
dumped monmap epoch 19

Redeploy the monitors:

Syntax

ceph orch apply mon --placement="HOST_1, HOST_2, HOST_3, HOST_4, HOST_5”

Example

[ceph: root@host01 /]# ceph orch apply mon --placement="host01, host02, host04, host05, host06"

Scheduled mon update…

4.6. Read affinity in stretch clusters
复制链接

Read Affinity reduces cross-zone traffic by keeping the data access within the respective data centers.

For stretched clusters deployed in multi-zone environments, the read affinity topology implementation provides a mechanism to help keep traffic within the data center it originated from. Ceph Object Gateway volumes have the ability to read data from an OSD in proximity to the client, according to OSD locations defined in the CRUSH map and topology labels on nodes.

For example, a stretch cluster contains a Ceph Object Gateway primary OSD and replicated OSDs spread across two data centers A and B. If a GET action is performed on an Object in data center A, the READ operation is performed on the data of the OSDs closest to the client in data center A.

4.6.1. Performing localized reads
复制链接

You can perform a localized read on a replicated pool in a stretch cluster. When a localized read request is made on a replicated pool, Ceph selects the local OSDs closest to the client based on the client location specified in crush_location.

Prerequisites

A stretch cluster with two data centers and Ceph Object Gateway configured on both.
A user created with a bucket having primary and replicated OSDs.

Procedure

To perform a localized read, set rados_replica_read_policy to 'localize' in the OSD daemon configuration using the ceph config set command.
```
[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy localize
```

Verification: Perform the below steps to verify the localized read from an OSD set.

Run the ceph osd tree command to view the OSDs and the data centers.

Example

[ceph: root@host01 /]# ceph osd tree

ID  CLASS  WEIGHT   TYPE NAME                                 STATUS  REWEIGHT  PRI-AFF
-1         0.58557  root default
-3         0.29279      datacenter DC1
-2         0.09760          host ceph-ci-fbv67y-ammmck-node2
 2    hdd  0.02440              osd.2                             up   1.00000  1.00000
11    hdd  0.02440              osd.11                            up   1.00000  1.00000
17    hdd  0.02440              osd.17                            up   1.00000  1.00000
22    hdd  0.02440              osd.22                            up   1.00000  1.00000
-4         0.09760          host ceph-ci-fbv67y-ammmck-node3
 0    hdd  0.02440              osd.0                             up   1.00000  1.00000
 6    hdd  0.02440              osd.6                             up   1.00000  1.00000
12    hdd  0.02440              osd.12                            up   1.00000  1.00000
18    hdd  0.02440              osd.18                            up   1.00000  1.00000
-5         0.09760          host ceph-ci-fbv67y-ammmck-node4
 5    hdd  0.02440              osd.5                             up   1.00000  1.00000
10    hdd  0.02440              osd.10                            up   1.00000  1.00000
16    hdd  0.02440              osd.16                            up   1.00000  1.00000
23    hdd  0.02440              osd.23                            up   1.00000  1.00000
-7         0.29279      datacenter DC2
-6         0.09760          host ceph-ci-fbv67y-ammmck-node5
 3    hdd  0.02440              osd.3                             up   1.00000  1.00000
 8    hdd  0.02440              osd.8                             up   1.00000  1.00000
14    hdd  0.02440              osd.14                            up   1.00000  1.00000
20    hdd  0.02440              osd.20                            up   1.00000  1.00000
-8         0.09760          host ceph-ci-fbv67y-ammmck-node6
 4    hdd  0.02440              osd.4                             up   1.00000  1.00000
 9    hdd  0.02440              osd.9                             up   1.00000  1.00000
15    hdd  0.02440              osd.15                            up   1.00000  1.00000
21    hdd  0.02440              osd.21                            up   1.00000  1.00000
-9         0.09760          host ceph-ci-fbv67y-ammmck-node7
 1    hdd  0.02440              osd.1                             up   1.00000  1.00000
 7    hdd  0.02440              osd.7                             up   1.00000  1.00000
13    hdd  0.02440              osd.13                            up   1.00000  1.00000
19    hdd  0.02440              osd.19                            up   1.00000  1.00000

Run the ceph orch command to identify the Ceph Object Gateway daemons in the data centers.

Example

[ceph: root@host01 /]# ceph orch ps | grep rg

rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex         ceph-ci-fbv67y-ammmck-node4            *:80              running (4h)     10m ago  22h    93.3M        -  19.1.0-55.el9cp  0ee0a0ad94c7  34f27723ccd2
rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp         ceph-ci-fbv67y-ammmck-node7            *:80              running (4h)     10m ago  22h    96.4M        -  19.1.0-55.el9cp  0ee0a0ad94c7  40e4f2a6d4c4

Verify if a default read has happened by running the vim command on the Ceph Object Gateway logs.

Example

[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log

2024-08-26T08:07:45.471+0000 7fc623e63640  1 ====== starting new request req=0x7fc5b93694a0 =====
2024-08-26T08:07:45.471+0000 7fc623e63640  1 -- 10.0.67.142:0/279982082 --> [v2:10.0.66.23:6816/73244434,v1:10.0.66.23:6817/73244434] -- osd_op(unknown.0.0:9081 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+localize_reads+known_if_redirected+supports_pool_eio e3533) -- 0x55f781bd2000 con 0x55f77f0e8c00

You can see in the logs that a localized read has taken place.

Important

To be able to view the debug logs, you must first enable debug_ms 1 in the configuration by running the ceph config set command.

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx    advanced  debug_ms    1/1

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq    advanced  debug_ms    1/1

4.6.2. Performing balanced reads
复制链接

You can perform a balanced read on a pool to retrieve evenly distributed OSDs across data centers. When a balanced READ is issued on a pool, the read operations are distributed evenly across all OSDs that are spread across the data centers.

Prerequisites

A stretch cluster with two data centers and Ceph Object Gateway configured on both.
A user created with a bucket and OSDs - primary and replicated OSDs.

Procedure

To perform a balanced read, set rados_replica_read_policy to 'balance' in the OSD daemon configuration using the ceph config set command.
```
[ceph: root@host01 /]# ceph config set client.rgw.rgw.1 rados_replica_read_policy balance
```

Verification: Perform the below steps to verify the balance read from an OSD set.

Run the ceph osd tree command to view the OSDs and the data centers.

Example

[ceph: root@host01 /]# ceph osd tree

ID  CLASS  WEIGHT   TYPE NAME                                 STATUS  REWEIGHT  PRI-AFF
-1         0.58557  root default
-3         0.29279      datacenter DC1
-2         0.09760          host ceph-ci-fbv67y-ammmck-node2
 2    hdd  0.02440              osd.2                             up   1.00000  1.00000
11    hdd  0.02440              osd.11                            up   1.00000  1.00000
17    hdd  0.02440              osd.17                            up   1.00000  1.00000
22    hdd  0.02440              osd.22                            up   1.00000  1.00000
-4         0.09760          host ceph-ci-fbv67y-ammmck-node3
 0    hdd  0.02440              osd.0                             up   1.00000  1.00000
 6    hdd  0.02440              osd.6                             up   1.00000  1.00000
12    hdd  0.02440              osd.12                            up   1.00000  1.00000
18    hdd  0.02440              osd.18                            up   1.00000  1.00000
-5         0.09760          host ceph-ci-fbv67y-ammmck-node4
 5    hdd  0.02440              osd.5                             up   1.00000  1.00000
10    hdd  0.02440              osd.10                            up   1.00000  1.00000
16    hdd  0.02440              osd.16                            up   1.00000  1.00000
23    hdd  0.02440              osd.23                            up   1.00000  1.00000
-7         0.29279      datacenter DC2
-6         0.09760          host ceph-ci-fbv67y-ammmck-node5
 3    hdd  0.02440              osd.3                             up   1.00000  1.00000
 8    hdd  0.02440              osd.8                             up   1.00000  1.00000
14    hdd  0.02440              osd.14                            up   1.00000  1.00000
20    hdd  0.02440              osd.20                            up   1.00000  1.00000
-8         0.09760          host ceph-ci-fbv67y-ammmck-node6
 4    hdd  0.02440              osd.4                             up   1.00000  1.00000
 9    hdd  0.02440              osd.9                             up   1.00000  1.00000
15    hdd  0.02440              osd.15                            up   1.00000  1.00000
21    hdd  0.02440              osd.21                            up   1.00000  1.00000
-9         0.09760          host ceph-ci-fbv67y-ammmck-node7
 1    hdd  0.02440              osd.1                             up   1.00000  1.00000
 7    hdd  0.02440              osd.7                             up   1.00000  1.00000
13    hdd  0.02440              osd.13                            up   1.00000  1.00000
19    hdd  0.02440              osd.19                            up   1.00000  1.00000

Run the ceph orch command to identify the Ceph Object Gateway daemons in the data centers.

Example

[ceph: root@host01 /]# ceph orch ps | grep rg

rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex         ceph-ci-fbv67y-ammmck-node4            *:80              running (4h)     10m ago  22h    93.3M        -  19.1.0-55.el9cp  0ee0a0ad94c7  34f27723ccd2
rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp         ceph-ci-fbv67y-ammmck-node7            *:80              running (4h)     10m ago  22h    96.4M        -  19.1.0-55.el9cp  0ee0a0ad94c7  40e4f2a6d4c4

Verify if a balanced read has happened by running the vim command on the Ceph Object Gateway logs.

Example

[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log

2024-08-27T09:32:25.510+0000 7f2a7a284640  1 ====== starting new request req=0x7f2a31fcf4a0 =====
2024-08-27T09:32:25.510+0000 7f2a7a284640  1 -- 10.0.67.142:0/3116867178 --> [v2:10.0.64.146:6816/2838383288,v1:10.0.64.146:6817/2838383288] -- osd_op(unknown.0.0:268731 11.55 11:ab26b168:::3acf4091-c54c-43b5-a495-c505fe545d25.27842.1_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e3554) -- 0x55cd1b88dc00 con 0x55cd18dd6000

You can see in the logs that a balanced read has taken place.

Important

To be able to view the debug logs, you must first enable debug_ms 1 in the configuration by running the ceph config set command.

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx    advanced  debug_ms    1/1

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq    advanced  debug_ms    1/1

4.6.3. Performing default reads
复制链接

You can perform a default read on a pool to retrieve data from primary data centers. When a default READ is issued on a pool, the IO operations are retrieved directly from each OSD in the data center.

Prerequisites

A stretch cluster with two data centers and Ceph Object Gateway configured on both.
A user created with a bucket and OSDs - primary and replicated OSDs.

Procedure

To perform a default read, set rados_replica_read_policy to 'default' in the OSD daemon configuration by using the ceph config set command.
Example
```
[ceph: root@host01 /]#ceph config set

client.rgw.rgw.1 advanced rados_replica_read_policy default
```
The IO operations from the closest OSD in a data center are retrieved when a GET operation is performed.

Verification: Perform the below steps to verify the localized read from an OSD set.

Run the ceph osd tree command to view the OSDs and the data centers.

Example

[ceph: root@host01 /]# ceph osd tree

ID  CLASS  WEIGHT   TYPE NAME                                 STATUS  REWEIGHT  PRI-AFF
-1         0.58557  root default
-3         0.29279      datacenter DC1
-2         0.09760          host ceph-ci-fbv67y-ammmck-node2
 2    hdd  0.02440              osd.2                             up   1.00000  1.00000
11    hdd  0.02440              osd.11                            up   1.00000  1.00000
17    hdd  0.02440              osd.17                            up   1.00000  1.00000
22    hdd  0.02440              osd.22                            up   1.00000  1.00000
-4         0.09760          host ceph-ci-fbv67y-ammmck-node3
 0    hdd  0.02440              osd.0                             up   1.00000  1.00000
 6    hdd  0.02440              osd.6                             up   1.00000  1.00000
12    hdd  0.02440              osd.12                            up   1.00000  1.00000
18    hdd  0.02440              osd.18                            up   1.00000  1.00000
-5         0.09760          host ceph-ci-fbv67y-ammmck-node4
 5    hdd  0.02440              osd.5                             up   1.00000  1.00000
10    hdd  0.02440              osd.10                            up   1.00000  1.00000
16    hdd  0.02440              osd.16                            up   1.00000  1.00000
23    hdd  0.02440              osd.23                            up   1.00000  1.00000
-7         0.29279      datacenter DC2
-6         0.09760          host ceph-ci-fbv67y-ammmck-node5
 3    hdd  0.02440              osd.3                             up   1.00000  1.00000
 8    hdd  0.02440              osd.8                             up   1.00000  1.00000
14    hdd  0.02440              osd.14                            up   1.00000  1.00000
20    hdd  0.02440              osd.20                            up   1.00000  1.00000
-8         0.09760          host ceph-ci-fbv67y-ammmck-node6
 4    hdd  0.02440              osd.4                             up   1.00000  1.00000
 9    hdd  0.02440              osd.9                             up   1.00000  1.00000
15    hdd  0.02440              osd.15                            up   1.00000  1.00000
21    hdd  0.02440              osd.21                            up   1.00000  1.00000
-9         0.09760          host ceph-ci-fbv67y-ammmck-node7
 1    hdd  0.02440              osd.1                             up   1.00000  1.00000
 7    hdd  0.02440              osd.7                             up   1.00000  1.00000
13    hdd  0.02440              osd.13                            up   1.00000  1.00000
19    hdd  0.02440              osd.19                            up   1.00000  1.00000

Run the ceph orch command to identify the Ceph Object Gateway daemons in the data centers.

Example

ceph orch ps | grep rg

rgw.rgw.1.ceph-ci-fbv67y-ammmck-node4.dmsmex         ceph-ci-fbv67y-ammmck-node4            *:80              running (4h)     10m ago  22h    93.3M        -  19.1.0-55.el9cp  0ee0a0ad94c7  34f27723ccd2
rgw.rgw.1.ceph-ci-fbv67y-ammmck-node7.pocecp         ceph-ci-fbv67y-ammmck-node7            *:80              running (4h)     10m ago  22h    96.4M        -  19.1.0-55.el9cp  0ee0a0ad94c7  40e4f2a6d4c4

Verify if a default read has happened by running the vim command on the Ceph Object Gateway logs.

Example

[ceph: root@host01 /]# vim /var/log/ceph/<fsid>/<ceph-client-rgw>.log

2024-08-28T10:26:05.155+0000 7fe6b03dd640  1 ====== starting new request req=0x7fe6879674a0 =====
2024-08-28T10:26:05.156+0000 7fe6b03dd640  1 -- 10.0.64.251:0/2235882725 --> [v2:10.0.65.171:6800/4255735352,v1:10.0.65.171:6801/4255735352] -- osd_op(unknown.0.0:1123 11.6d 11:b69767fc:::699c2d80-5683-43c5-bdcd-e8912107c176.24827.3_f1:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e4513) -- 0x5639da653800 con 0x5639d804d800

You can see in the logs that a default read has taken place.

Important

To be able to view the debug logs, you must first enable debug_ms 1 in the configuration by running the ceph config set command.

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node4.dgvrmx    advanced  debug_ms    1/1

[ceph: root@host01 /]#ceph config set client.rgw.rgw.1.ceph-ci-gune2w-mysx73-node7.rfkqqq    advanced  debug_ms    1/1

此内容没有您所选择的语言版本。

Chapter 4. Stretch clusters for Ceph storage

4.1. Stretch mode for a storage cluster
复制链接

4.2. Deployment requirements
复制链接

4.3. Setting the CRUSH location for the daemons
复制链接

4.3.1. Setting the CRUSH location during bootstrap
复制链接

4.3.2. Manually setting the CRUSH location for daemons
复制链接

4.4. Configuring a CRUSH map for stretch mode
复制链接

4.4.1. Changing stretch mode
复制链接

4.4.1.1. Entering stretch mode
复制链接

4.4.1.2. Exiting stretch mode
复制链接

4.4.2. Verifying stretch mode
复制链接

4.5. Using and maintaining stretch mode
复制链接

4.5.1. Adding OSD hosts in stretch mode
复制链接

4.5.2. Managing data center monitor service hosts in stretch mode
复制链接

4.5.2.1. Managing a mon service with a service specification file
复制链接

4.5.2.2. Managing a mon service with the command-line interface
复制链接

4.5.3. Replacing the tiebreaker with a monitor in quorum
复制链接

4.5.4. Replacing the tiebreaker with a new monitor
复制链接

4.6. Read affinity in stretch clusters
复制链接

4.6.1. Performing localized reads
复制链接

4.6.2. Performing balanced reads
复制链接

4.6.3. Performing default reads
复制链接

学习

尝试、购买和销售

社区

關於紅帽

让开源更具包容性

关于红帽文档

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

此内容没有您所选择的语言版本。

Chapter 4. Stretch clusters for Ceph storage

4.1. Stretch mode for a storage cluster复制链接链接已复制到粘贴板!

4.2. Deployment requirements复制链接链接已复制到粘贴板!

4.3. Setting the CRUSH location for the daemons复制链接链接已复制到粘贴板!

4.3.1. ​Setting the CRUSH location during bootstrap复制链接链接已复制到粘贴板!

4.3.2. ​​​Manually setting the CRUSH location for daemons复制链接链接已复制到粘贴板!

4.4. Configuring a CRUSH map for stretch mode复制链接链接已复制到粘贴板!

4.4.1. Changing stretch mode复制链接链接已复制到粘贴板!

4.4.1.1. Entering stretch mode复制链接链接已复制到粘贴板!

4.4.1.2. Exiting stretch mode复制链接链接已复制到粘贴板!

4.4.2. Verifying stretch mode复制链接链接已复制到粘贴板!

4.5. Using and maintaining stretch mode复制链接链接已复制到粘贴板!

4.5.1. Adding OSD hosts in stretch mode复制链接链接已复制到粘贴板!

4.5.2. Managing data center monitor service hosts in stretch mode复制链接链接已复制到粘贴板!

4.5.2.1. Managing a mon service with a service specification file复制链接链接已复制到粘贴板!

4.5.2.2. Managing a mon service with the command-line interface复制链接链接已复制到粘贴板!

4.5.3. Replacing the tiebreaker with a monitor in quorum复制链接链接已复制到粘贴板!

4.5.4. Replacing the tiebreaker with a new monitor复制链接链接已复制到粘贴板!

4.6. Read affinity in stretch clusters复制链接链接已复制到粘贴板!

4.6.1. Performing localized reads复制链接链接已复制到粘贴板!

4.6.2. Performing balanced reads复制链接链接已复制到粘贴板!

4.6.3. Performing default reads复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

關於紅帽

让开源更具包容性

关于红帽文档

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. Stretch mode for a storage cluster
复制链接

4.2. Deployment requirements
复制链接

4.3. Setting the CRUSH location for the daemons
复制链接

4.3.1. Setting the CRUSH location during bootstrap
复制链接

4.3.2. Manually setting the CRUSH location for daemons
复制链接

4.4. Configuring a CRUSH map for stretch mode
复制链接

4.4.1. Changing stretch mode
复制链接

4.4.1.1. Entering stretch mode
复制链接

4.4.1.2. Exiting stretch mode
复制链接

4.4.2. Verifying stretch mode
复制链接

4.5. Using and maintaining stretch mode
复制链接

4.5.1. Adding OSD hosts in stretch mode
复制链接

4.5.2. Managing data center monitor service hosts in stretch mode
复制链接

4.5.2.1. Managing a mon service with a service specification file
复制链接

4.5.2.2. Managing a mon service with the command-line interface
复制链接

4.5.3. Replacing the tiebreaker with a monitor in quorum
复制链接

4.5.4. Replacing the tiebreaker with a new monitor
复制链接

4.6. Read affinity in stretch clusters
复制链接

4.6.1. Performing localized reads
复制链接

4.6.2. Performing balanced reads
复制链接

4.6.3. Performing default reads
复制链接