3.5. Deploying Red Hat Ceph Storage
3.5.1. Node pre-deployment steps 复制链接链接已复制到粘贴板!
Before installing the Red Hat Ceph Storage Ceph cluster, perform the following steps to fulfill all the requirements needed.
Register all the nodes to the Red Hat Network or Red Hat Satellite and subscribe to a valid pool:
subscription-manager register subscription-manager subscribe --pool=8a8XXXXXX9e0Enable access for all the nodes in the Ceph cluster for the following repositories:
-
rhel9-for-x86_64-baseos-rpms rhel9-for-x86_64-appstream-rpmssubscription-manager repos --disable="*" --enable="rhel9-for-x86_64-baseos-rpms" --enable="rhel9-for-x86_64-appstream-rpms"
-
Update the operating system RPMs to the latest version and reboot if needed:
dnf update -y rebootSelect a node from the cluster to be your bootstrap node.
ceph1is our bootstrap node in this example going forward.Only on the bootstrap node
ceph1, enable theansible-2.9-for-rhel-9-x86_64-rpmsandrhceph-6-tools-for-rhel-9-x86_64-rpmsrepositories:subscription-manager repos --enable="ansible-2.9-for-rhel-9-x86_64-rpms" --enable="rhceph-6-tools-for-rhel-9-x86_64-rpms"Configure the
hostnameusing the bare/short hostname in all the hosts.hostnamectl set-hostname <short_name>Verify the hostname configuration for deploying Red Hat Ceph Storage with cephadm.
$ hostnameExample output:
ceph1Modify /etc/hosts file and add the fqdn entry to the 127.0.0.1 IP by setting the DOMAIN variable with our DNS domain name.
DOMAIN="example.domain.com" cat <<EOF >/etc/hosts 127.0.0.1 $(hostname).${DOMAIN} $(hostname) localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 $(hostname).${DOMAIN} $(hostname) localhost6 localhost6.localdomain6 EOFCheck the long hostname with the
fqdnusing thehostname -foption.$ hostname -fExample output:
ceph1.example.domain.com注意To know more about why these changes are required, see Fully Qualified Domain Names vs Bare Host Names.
Run the following steps on the bootstrap node. In our example, the bootstrap node is
ceph1.Install the
cephadm-ansibleRPM package:$ sudo dnf install -y cephadm-ansible重要To run the ansible playbooks, you must have
sshpasswordless access to all the nodes that are configured to the Red Hat Ceph Storage cluster. Ensure that the configured user (for example,deployment-user) has root privileges to invoke thesudocommand without needing a password.To use a custom key, configure the selected user (for example,
deployment-user) ssh config file to specify the id/key that will be used for connecting to the nodes via ssh:cat <<EOF > ~/.ssh/config Host ceph* User deployment-user IdentityFile ~/.ssh/ceph.pem EOFBuild the ansible inventory
cat <<EOF > /usr/share/cephadm-ansible/inventory ceph1 ceph2 ceph3 ceph4 ceph5 ceph6 ceph7 [admin] ceph1 ceph4 EOF注意Here, the Hosts (
Ceph1andCeph4) belonging to two different data centers are configured as part of the [admin] group on the inventory file and are tagged as_adminbycephadm. Each of these admin nodes receive the admin ceph keyring during the bootstrap process so that when one data center is down, we can check using the other available admin node.Verify that
ansiblecan access all nodes using the ping module before running the pre-flight playbook.$ ansible -i /usr/share/cephadm-ansible/inventory -m ping all -bExample output:
ceph6 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph4 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph3 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph2 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph5 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph1 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" } ceph7 | SUCCESS => { "ansible_facts": { "discovered_interpreter_python": "/usr/libexec/platform-python" }, "changed": false, "ping": "pong" }-
Navigate to the
/usr/share/cephadm-ansibledirectory. Run ansible-playbook with relative file paths.
$ ansible-playbook -i /usr/share/cephadm-ansible/inventory /usr/share/cephadm-ansible/cephadm-preflight.yml --extra-vars "ceph_origin=rhcs"The preflight playbook Ansible playbook configures the RHCS
dnfrepository and prepares the storage cluster for bootstrapping. It also installs podman, lvm2, chronyd, and cephadm. The default location forcephadm-ansibleandcephadm-preflight.ymlis/usr/share/cephadm-ansible. For additional information, see Running the preflight playbook
The cephadm utility installs and starts a single Ceph Monitor daemon and a Ceph Manager daemon for a new Red Hat Ceph Storage cluster on the local node where the cephadm bootstrap command is run.
In this guide we are going to bootstrap the cluster and deploy all the needed Red Hat Ceph Storage services in one step using a cluster specification yaml file.
If you find issues during the deployment, it may be easier to troubleshoot the errors by dividing the deployment into two steps:
- Bootstrap
- Service deployment
For additional information on the bootstrapping process, see Bootstrapping a new storage cluster.
Procedure
Create json file to authenticate against the container registry using a json file as follows:
$ cat <<EOF > /root/registry.json { "url":"registry.redhat.io", "username":"User", "password":"Pass" } EOFCreate a
cluster-spec.yamlthat adds the nodes to the Red Hat Ceph Storage cluster and also sets specific labels for where the services should run following table 3.1.cat <<EOF > /root/cluster-spec.yaml service_type: host addr: 10.0.40.78 ## <XXX.XXX.XXX.XXX> hostname: ceph1 ## <ceph-hostname-1> location: root: default datacenter: DC1 labels: - osd - mon - mgr --- service_type: host addr: 10.0.40.35 hostname: ceph2 location: datacenter: DC1 labels: - osd - mon --- service_type: host addr: 10.0.40.24 hostname: ceph3 location: datacenter: DC1 labels: - osd - mds - rgw --- service_type: host addr: 10.0.40.185 hostname: ceph4 location: root: default datacenter: DC2 labels: - osd - mon - mgr --- service_type: host addr: 10.0.40.88 hostname: ceph5 location: datacenter: DC2 labels: - osd - mon --- service_type: host addr: 10.0.40.66 hostname: ceph6 location: datacenter: DC2 labels: - osd - mds - rgw --- service_type: host addr: 10.0.40.221 hostname: ceph7 labels: - mon --- service_type: mon placement: label: "mon" --- service_type: mds service_id: cephfs placement: label: "mds" --- service_type: mgr service_name: mgr placement: label: "mgr" --- service_type: osd service_id: all-available-devices service_name: osd.all-available-devices placement: label: "osd" spec: data_devices: all: true --- service_type: rgw service_id: objectgw service_name: rgw.objectgw placement: count: 2 label: "rgw" spec: rgw_frontend_port: 8080 EOFRetrieve the IP for the NIC with the Red Hat Ceph Storage public network configured from the bootstrap node. After substituting
10.0.40.0with the subnet that you have defined in your ceph public network, execute the following command.$ ip a | grep 10.0.40Example output:
10.0.40.78Run the
cephadmbootstrap command as the root user on the node that will be the initial Monitor node in the cluster. TheIP_ADDRESSoption is the node’s IP address that you are using to run thecephadm bootstrapcommand.注意If you have configured a different user instead of
rootfor passwordless SSH access, then use the--ssh-user=flag with thecepadm bootstrapcommand.If you are using non default/id_rsa ssh key names, then use
--ssh-private-keyand--ssh-public-keyoptions withcephadmcommand.$ cephadm bootstrap --ssh-user=deployment-user --mon-ip 10.0.40.78 --apply-spec /root/cluster-spec.yaml --registry-json /root/registry.json重要If the local node uses fully-qualified domain names (FQDN), then add the
--allow-fqdn-hostnameoption tocephadm bootstrapon the command line.Once the bootstrap finishes, you will see the following output from the previous cephadm bootstrap command:
You can access the Ceph CLI with: sudo /usr/sbin/cephadm shell --fsid dd77f050-9afe-11ec-a56c-029f8148ea14 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring Consider enabling telemetry to help improve Ceph: ceph telemetry on For more information see: https://docs.ceph.com/docs/pacific/mgr/telemetry/Verify the status of Red Hat Ceph Storage cluster deployment using the Ceph CLI client from ceph1:
$ ceph -sExample output:
cluster: id: 3a801754-e01f-11ec-b7ab-005056838602 health: HEALTH_OK services: mon: 5 daemons, quorum ceph1,ceph2,ceph4,ceph5,ceph7 (age 4m) mgr: ceph1.khuuot(active, since 5m), standbys: ceph4.zotfsp osd: 12 osds: 12 up (since 3m), 12 in (since 4m) rgw: 2 daemons active (2 hosts, 1 zones) data: pools: 5 pools, 107 pgs objects: 191 objects, 5.3 KiB usage: 105 MiB used, 600 GiB / 600 GiB avail 105 active+clean注意It may take several minutes for all the services to start.
It is normal to get a global recovery event while you do not have any OSDs configured.
You can use
ceph orch psandceph orch lsto further check the status of the services.Verify if all the nodes are part of the
cephadmcluster.$ ceph orch host lsExample output:
HOST ADDR LABELS STATUS ceph1 10.0.40.78 _admin osd mon mgr ceph2 10.0.40.35 osd mon ceph3 10.0.40.24 osd mds rgw ceph4 10.0.40.185 osd mon mgr ceph5 10.0.40.88 osd mon ceph6 10.0.40.66 osd mds rgw ceph7 10.0.40.221 mon注意You can run Ceph commands directly from the host because
ceph1was configured in thecephadm-ansibleinventory as part of the [admin] group. The Ceph admin keys were copied to the host during thecephadm bootstrapprocess.Check the current placement of the Ceph monitor services on the datacenters.
$ ceph orch ps | grep mon | awk '{print $1 " " $2}'Example output:
mon.ceph1 ceph1 mon.ceph2 ceph2 mon.ceph4 ceph4 mon.ceph5 ceph5 mon.ceph7 ceph7Check the current placement of the Ceph manager services on the datacenters.
$ ceph orch ps | grep mgr | awk '{print $1 " " $2}'Example output:
mgr.ceph2.ycgwyz ceph2 mgr.ceph5.kremtt ceph5Check the ceph osd crush map layout to ensure that each host has one OSD configured and its status is
UP. Also, double-check that each node is under the right datacenter bucket as specified in table 3.1$ ceph osd treeExample output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.87900 root default -16 0.43950 datacenter DC1 -11 0.14650 host ceph1 2 ssd 0.14650 osd.2 up 1.00000 1.00000 -3 0.14650 host ceph2 3 ssd 0.14650 osd.3 up 1.00000 1.00000 -13 0.14650 host ceph3 4 ssd 0.14650 osd.4 up 1.00000 1.00000 -17 0.43950 datacenter DC2 -5 0.14650 host ceph4 0 ssd 0.14650 osd.0 up 1.00000 1.00000 -9 0.14650 host ceph5 1 ssd 0.14650 osd.1 up 1.00000 1.00000 -7 0.14650 host ceph6 5 ssd 0.14650 osd.5 up 1.00000 1.00000Create and enable a new RDB block pool.
$ ceph osd pool create 32 32 $ ceph osd pool application enable rbdpool rbd注意The number 32 at the end of the command is the number of PGs assigned to this pool. The number of PGs can vary depending on several factors like the number of OSDs in the cluster, expected % used of the pool, etc. You can use the following calculator to determine the number of PGs needed: Ceph Placement Groups (PGs) per Pool Calculator.
Verify that the RBD pool has been created.
$ ceph osd lspools | grep rbdpoolExample output:
3 rbdpoolVerify that MDS services are active and have located one service on each datacenter.
$ ceph orch ps | grep mdsExample output:
mds.cephfs.ceph3.cjpbqo ceph3 running (17m) 117s ago 17m 16.1M - 16.2.9 mds.cephfs.ceph6.lqmgqt ceph6 running (17m) 117s ago 17m 16.1M - 16.2.9Create the CephFS volume.
$ ceph fs volume create cephfs注意The
ceph fs volume createcommand also creates the needed data and meta CephFS pools. For more information, see Configuring and Mounting Ceph File Systems.Check the
Cephstatus to verify how the MDS daemons have been deployed. Ensure that the state is active whereceph6is the primary MDS for this filesystem andceph3is the secondary MDS.$ ceph fs statusExample output:
cephfs - 0 clients ====== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active cephfs.ceph6.ggjywj Reqs: 0 /s 10 13 12 0 POOL TYPE USED AVAIL cephfs.cephfs.meta metadata 96.0k 284G cephfs.cephfs.data data 0 284G STANDBY MDS cephfs.ceph3.ogcqklVerify that RGW services are active.
$ ceph orch ps | grep rgwExample output:
rgw.objectgw.ceph3.kkmxgb ceph3 *:8080 running (7m) 3m ago 7m 52.7M - 16.2.9 rgw.objectgw.ceph6.xmnpah ceph6 *:8080 running (7m) 3m ago 7m 53.3M - 16.2.9
3.5.3. Configuring Red Hat Ceph Storage stretch mode 复制链接链接已复制到粘贴板!
Once the Red Hat Ceph Storage cluster is fully deployed using cephadm, use the following procedure to configure the stretch cluster mode. The new stretch mode is designed to handle the 2-site case.
Procedure
Check the current election strategy being used by the monitors with the ceph mon dump command. By default in a ceph cluster, the connectivity is set to classic.
ceph mon dump | grep election_strategyExample output:
dumped monmap epoch 9 election_strategy: 1Change the monitor election to connectivity.
ceph mon set election_strategy connectivityRun the previous ceph mon dump command again to verify the election_strategy value.
$ ceph mon dump | grep election_strategyExample output:
dumped monmap epoch 10 election_strategy: 3To know more about the different election strategies, see Configuring monitor election strategy.
Set the location for all our Ceph monitors:
ceph mon set_location ceph1 datacenter=DC1 ceph mon set_location ceph2 datacenter=DC1 ceph mon set_location ceph4 datacenter=DC2 ceph mon set_location ceph5 datacenter=DC2 ceph mon set_location ceph7 datacenter=DC3Verify that each monitor has its appropriate location.
$ ceph mon dumpExample output:
epoch 17 fsid dd77f050-9afe-11ec-a56c-029f8148ea14 last_changed 2022-03-04T07:17:26.913330+0000 created 2022-03-03T14:33:22.957190+0000 min_mon_release 16 (pacific) election_strategy: 3 0: [v2:10.0.143.78:3300/0,v1:10.0.143.78:6789/0] mon.ceph1; crush_location {datacenter=DC1} 1: [v2:10.0.155.185:3300/0,v1:10.0.155.185:6789/0] mon.ceph4; crush_location {datacenter=DC2} 2: [v2:10.0.139.88:3300/0,v1:10.0.139.88:6789/0] mon.ceph5; crush_location {datacenter=DC2} 3: [v2:10.0.150.221:3300/0,v1:10.0.150.221:6789/0] mon.ceph7; crush_location {datacenter=DC3} 4: [v2:10.0.155.35:3300/0,v1:10.0.155.35:6789/0] mon.ceph2; crush_location {datacenter=DC1}Create a CRUSH rule that makes use of this OSD crush topology by installing the
ceph-baseRPM package in order to use thecrushtoolcommand:$ dnf -y install ceph-baseTo know more about CRUSH ruleset, see Ceph CRUSH ruleset.
Get the compiled CRUSH map from the cluster:
$ ceph osd getcrushmap > /etc/ceph/crushmap.binDecompile the CRUSH map and convert it to a text file in order to be able to edit it:
$ crushtool -d /etc/ceph/crushmap.bin -o /etc/ceph/crushmap.txtAdd the following rule to the CRUSH map by editing the text file
/etc/ceph/crushmap.txtat the end of the file.$ vim /etc/ceph/crushmap.txtrule stretch_rule { id 1 type replicated min_size 1 max_size 10 step take default step choose firstn 0 type datacenter step chooseleaf firstn 2 type host step emit } # end crush mapThis example is applicable for active applications in both OpenShift Container Platform clusters.
注意The rule
idhas to be unique. In the example, we only have one more crush rule with id 0 hence we are using id 1. If your deployment has more rules created, then use the next free id.The CRUSH rule declared contains the following information:
Rule name- Description: A unique whole name for identifying the rule.
-
Value:
stretch_rule
id- Description: A unique whole number for identifying the rule.
-
Value:
1
type- Description: Describes a rule for either a storage drive replicated or erasure-coded.
-
Value:
replicated
min_size- Description: If a pool makes fewer replicas than this number, CRUSH will not select this rule.
- Value: 1
max_size- Description: If a pool makes more replicas than this number, CRUSH will not select this rule.
- Value: 10
step take default-
Description: Takes the root bucket called
default, and begins iterating down the tree.
-
Description: Takes the root bucket called
step choose firstn 0 type datacenter- Description: Selects the datacenter bucket, and goes into its subtrees.
step chooseleaf firstn 2 type host- Description: Selects the number of buckets of the given type. In this case, it is two different hosts located in the datacenter it entered at the previous level.
step emit- Description: Outputs the current value and empties the stack. Typically used at the end of a rule, but may also be used to pick from different trees in the same rule.
Compile the new CRUSH map from the file
/etc/ceph/crushmap.txtand convert it to a binary file called/etc/ceph/crushmap2.bin:$ crushtool -c /etc/ceph/crushmap.txt -o /etc/ceph/crushmap2.binInject the new crushmap we created back into the cluster:
$ ceph osd setcrushmap -i /etc/ceph/crushmap2.binExample output:
17注意The number 17 is a counter and it will increase (18,19, and so on) depending on the changes you make to the crush map.
Verify that the stretched rule created is now available for use.
ceph osd crush rule lsExample output:
replicated_rule stretch_ruleEnable the stretch cluster mode.
$ ceph mon enable_stretch_mode ceph7 stretch_rule datacenterIn this example,
ceph7is the arbiter node,stretch_ruleis the crush rule we created in the previous step anddatacenteris the dividing bucket.Verify all our pools are using the
stretch_ruleCRUSH rule we have created in our Ceph cluster:$ for pool in $(rados lspools);do echo -n "Pool: ${pool}; ";ceph osd pool get ${pool} crush_rule;doneExample output:
Pool: device_health_metrics; crush_rule: stretch_rule Pool: cephfs.cephfs.meta; crush_rule: stretch_rule Pool: cephfs.cephfs.data; crush_rule: stretch_rule Pool: .rgw.root; crush_rule: stretch_rule Pool: default.rgw.log; crush_rule: stretch_rule Pool: default.rgw.control; crush_rule: stretch_rule Pool: default.rgw.meta; crush_rule: stretch_rule Pool: rbdpool; crush_rule: stretch_ruleThis indicates that a working Red Hat Ceph Storage stretched cluster with arbiter mode is now available.