Chapter 15. Scaling the Ceph Storage cluster
You can scale the size of your Ceph Storage cluster by adding or removing storage nodes.
15.1. Scaling up the Ceph Storage cluster
As capacity and performance requirements change, you can scale up your Ceph Storage cluster to meet increased demands. Before doing so, ensure that you have enough nodes for the updated deployment. Then you can register and tag the new nodes in your Red Hat OpenStack Platform (RHOSP) environment.
To register new Ceph Storage nodes with director, complete this procedure.
Procedure
-
Log in to the undercloud node as the
stack
user. Modify the
~/overcloud-baremetal-deploy.yaml
to add the CephStorage nodes to the deployment.The following example file represents an original deployment with three CephStorage nodes.
- name: CephStorage count: 3 instances: - hostname: ceph-0 name: ceph-0 - hostname: ceph-1 name: ceph-2 - hostname: ceph-2 name: ceph-2
The following example modifies this file to add three additional nodes.
- name: CephStorage count: 6 instances: - hostname: ceph-0 name: ceph-0 - hostname: ceph-1 name: ceph-2 - hostname: ceph-2 name: ceph-2 - hostname: ceph-3 name: ceph-3 - hostname: ceph-4 name: ceph-4 - hostname: ceph-5 name: ceph-5
Use the
openstack overcloud node provision
command with the updated~/overcloud-baremetal-deploy.yaml
file.openstack overcloud node provision \ --stack overcloud \ --network-config \ --output ~/overcloud-baremetal-deployed.yaml \ ~/overcloud-baremetal-deploy.yaml
NoteThis command will provision the configured nodes and and output an updated copy of
~/overcloud-baremetal-deployed.yaml
. The new version updates theCephStorage
. TheDeployedServerPortMap
andHostnameMap
also contain the new storage nodes.Use the
openstack overcloud deploy
command with the updated~/overcloud-baremetal-deployed.yaml
file.openstack overcloud deploy --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/cephadm/cephadm.yaml \ -e deployed_ceph.yaml -e overcloud-baremetal-deploy.yaml
Result
The following actions occur when the openstack overcloud deploy
command runs:
-
The storage networks and firewall rules are configured on the new
CephStorage
nodes. -
The
ceph-admin
user is created on the newCephStorage
nodes. -
The
ceph-admin
user public SSH key is distributed to the newCephStorage
nodes so thatcephadm
can use SSH to add extra nodes. -
If a new
CephMon
orCephMgr
node is added, theceph-admin
private SSH key is also distributed to that node. -
An updated Ceph specification is generated and installed on the bootstrap node. This updated specification will typically be available in
/home/ceph-admin/specs/ceph_spec.yaml
on the bootstrap node. -
The
cephadm
bootstrap process is skipped becausecephadm ls
indicates the Ceph containers are already running. -
The updated Ceph specification is applied and
cephadm
schedules the new nodes to join the Ceph cluster.
15.2. Scaling down and replacing Ceph Storage nodes
In some cases, you might need to scale down your Ceph Storage cluster or replace a Ceph Storage node. In either situation, you must disable and rebalance the Ceph Storage nodes that you want to remove from the overcloud to prevent data loss.
Do not proceed with this procedure if the Ceph Storage cluster does not have the capacity to lose OSDs.
-
Log in to the overcloud Controller node as the
tripleo-admin
user. -
Use the
sudo cephadm shell
command to start a Ceph shell. Use the
ceph osd tree
command to identify OSDs to be removed by server.In the following example we want to identify the OSDs of
ceph-2
host.[ceph: root@oc0-controller-0 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.58557 root default -7 0.19519 host ceph-2 5 hdd 0.04880 osd.5 up 1.00000 1.00000 7 hdd 0.04880 osd.7 up 1.00000 1.00000 9 hdd 0.04880 osd.9 up 1.00000 1.00000 11 hdd 0.04880 osd.11 up 1.00000 1.00000
Export the Ceph cluster specification to a YAML file.
[ceph: root@oc0-controller-0 /]# ceph orch ls --export > spec.yml
-
Edit the exported specification file so that the applicable hosts are removed from the
service-type: osd hosts
list and the applicable hosts have theplacement: hosts
value removed. - Save the edited file.
Apply the modified Ceph specification file.
[ceph: root@oc0-controller-0 /]# ceph orch apply -i spec.yml
ImportantIf you do not export and edit the Ceph specification file before removing the OSDs, the Ceph Manager will attempt to recreate the OSDs.
Use the command
ceph orch osd rm --zap <osd_list>
to remove the OSDs.[ceph: root@oc0-controller-0 /]# ceph orch osd rm --zap 5 7 9 11 Scheduled OSD(s) for removal [ceph: root@oc0-controller-0 /]# ceph orch osd rm status OSD_ID HOST STATE PG_COUNT REPLACE FORCE DRAIN_STARTED_AT 7 ceph-2 draining 27 False False 2021-04-23 21:35:51.215361 9 ceph-2 draining 8 False False 2021-04-23 21:35:49.111500 11 ceph-2 draining 14 False False 2021-04-23 21:35:50.243762
Use the command
ceph orch osd status
to check the status of OSD removal.[ceph: root@oc0-controller-0 /]# ceph orch osd rm status OSD_ID HOST STATE PG_COUNT REPLACE FORCE DRAIN_STARTED_AT 7 ceph-2 draining 34 False False 2021-04-23 21:35:51.215361 11 ceph-2 draining 14 False False 2021-04-23 21:35:50.243762
WarningDo not proceed with the next step until this command returns no results.
Use the command
ceph orch host drain <HOST>
to drain any remaining daemons.[ceph: root@oc0-controller-0 /]# ceph orch host drain ceph-2
Use the command
ceph orch host rm <HOST>
to remove the host.[ceph: root@oc0-controller-0 /]# ceph orch host rm ceph-2
- End the Ceph shell session.
-
Log out of the
tripleo-admin
account. -
Log in to the undercloud node as the
stack
user. Modify the
~/overcloud-baremetal-deploy.yaml
in the following ways:-
Decrease the
count
attribute in the roles to be scaled down. Add an
instances
entry for each node being unprovisioned. Each entry must contain the following:-
The
name
of the baremetal node. -
The
hostname
assigned to that node. A
provisioned: false
value.The following example would remove the node
overcloud-compute-1
.- name: Compute count: 1 instances: - hostname: overcloud-compute-0 name: node10 # Removed from deployment due to disk failure provisioned: false - hostname: overcloud-compute-1 name: node11
-
The
-
Decrease the
Use the
openstack overcloud node delete
command to remove the node.openstack overcloud node delete \ --stack overcloud \ --baremetal-deployment ~/overcloud-baremetal-deploy.yaml
NoteA list of nodes to delete will be provided with a confirmation prompt before the nodes are deleted.
If scaling down the Ceph cluster is temporary and the nodes removed will be restored later, the scaling up action can increment the count
and set provisioned: true
on nodes that were previously set provisioned: false
. If the node will never reused, it can be set provisioned: false
indefinitely and the scaling up action can specify a new instances entry.
+ The following file sample provides some examples of each instance.
+
- name: Compute count: 2 instances: - hostname: overcloud-compute-0 name: node10 # Removed from deployment due to disk failure provisioned: false - hostname: overcloud-compute-1 name: node11 - hostname: overcloud-compute-2 name: node12