Chapter 15. Scaling the Ceph Storage cluster


You can scale the size of your Ceph Storage cluster by adding or removing storage nodes.

15.1. Scaling up the Ceph Storage cluster

As capacity and performance requirements change, you can scale up your Ceph Storage cluster to meet increased demands. Before doing so, ensure that you have enough nodes for the updated deployment. Then you can register and tag the new nodes in your Red Hat OpenStack Platform (RHOSP) environment.

To register new Ceph Storage nodes with director, complete this procedure.

Procedure

  1. Log in to the undercloud node as the stack user.
  2. Modify the ~/overcloud-baremetal-deploy.yaml to add the CephStorage nodes to the deployment.

    The following example file represents an original deployment with three CephStorage nodes.

    - name: CephStorage
      count: 3
      instances:
        - hostname: ceph-0
          name: ceph-0
        - hostname: ceph-1
          name: ceph-2
        - hostname: ceph-2
          name: ceph-2

    The following example modifies this file to add three additional nodes.

    - name: CephStorage
      count: 6
      instances:
        - hostname: ceph-0
          name: ceph-0
        - hostname: ceph-1
          name: ceph-2
        - hostname: ceph-2
          name: ceph-2
        - hostname: ceph-3
          name: ceph-3
        - hostname: ceph-4
          name: ceph-4
        - hostname: ceph-5
          name: ceph-5
  3. Use the openstack overcloud node provision command with the updated ~/overcloud-baremetal-deploy.yaml file.

    openstack overcloud node provision \
      --stack overcloud \
      --network-config \
      --output ~/overcloud-baremetal-deployed.yaml \
      ~/overcloud-baremetal-deploy.yaml
    Note

    This command will provision the configured nodes and and output an updated copy of ~/overcloud-baremetal-deployed.yaml. The new version updates the CephStorage. The DeployedServerPortMap and HostnameMap also contain the new storage nodes.

  4. Use the openstack overcloud deploy command with the updated ~/overcloud-baremetal-deployed.yaml file.

    openstack overcloud deploy --templates \
      -e /usr/share/openstack-tripleo-heat-templates/environments/cephadm/cephadm.yaml \
      -e deployed_ceph.yaml
      -e overcloud-baremetal-deploy.yaml

Result

The following actions occur when the openstack overcloud deploy command runs:

  • The storage networks and firewall rules are configured on the new CephStorage nodes.
  • The ceph-admin user is created on the new CephStorage nodes.
  • The ceph-admin user public SSH key is distributed to the new CephStorage nodes so that cephadm can use SSH to add extra nodes.
  • If a new CephMon or CephMgr node is added, the ceph-admin private SSH key is also distributed to that node.
  • An updated Ceph specification is generated and installed on the bootstrap node. This updated specification will typically be available in /home/ceph-admin/specs/ceph_spec.yaml on the bootstrap node.
  • The cephadm bootstrap process is skipped because cephadm ls indicates the Ceph containers are already running.
  • The updated Ceph specification is applied and cephadm schedules the new nodes to join the Ceph cluster.

15.2. Scaling down and replacing Ceph Storage nodes

In some cases, you might need to scale down your Ceph Storage cluster or replace a Ceph Storage node. In either situation, you must disable and rebalance the Ceph Storage nodes that you want to remove from the overcloud to prevent data loss.

Procedure

Do not proceed with this procedure if the Ceph Storage cluster does not have the capacity to lose OSDs.

  1. Log in to the overcloud Controller node as the tripleo-admin user.
  2. Use the sudo cephadm shell command to start a Ceph shell.
  3. Use the ceph osd tree command to identify OSDs to be removed by server.

    In the following example we want to identify the OSDs of ceph-2 host.

    [ceph: root@oc0-controller-0 /]# ceph osd tree
    ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
    -1         0.58557  root default
    -7         0.19519  host ceph-2
     5    hdd  0.04880       osd.5           up      1.00000  1.00000
     7    hdd  0.04880       osd.7           up      1.00000  1.00000
     9    hdd  0.04880       osd.9           up      1.00000  1.00000
    11    hdd  0.04880       osd.11          up      1.00000  1.00000
  4. Export the Ceph cluster specification to a YAML file.

    [ceph: root@oc0-controller-0 /]# ceph orch ls --export > spec.yml
  5. Edit the exported specification file so that the applicable hosts are removed from the service-type: osd hosts list and the applicable hosts have the placement: hosts value removed.
  6. Save the edited file.
  7. Apply the modified Ceph specification file.

    [ceph: root@oc0-controller-0 /]# ceph orch apply -i spec.yml
    Important

    If you do not export and edit the Ceph specification file before removing the OSDs, the Ceph Manager will attempt to recreate the OSDs.

  8. Use the command ceph orch osd rm --zap <osd_list> to remove the OSDs.

    [ceph: root@oc0-controller-0 /]# ceph orch osd rm --zap 5 7 9 11
    Scheduled OSD(s) for removal
    [ceph: root@oc0-controller-0 /]# ceph orch osd rm status
    OSD_ID HOST   STATE    PG_COUNT REPLACE  FORCE  DRAIN_STARTED_AT
    7      ceph-2 draining 27       False    False  2021-04-23 21:35:51.215361
    9      ceph-2 draining 8        False    False  2021-04-23 21:35:49.111500
    11     ceph-2 draining 14       False    False  2021-04-23 21:35:50.243762
  9. Use the command ceph orch osd status to check the status of OSD removal.

    [ceph: root@oc0-controller-0 /]# ceph orch osd rm status
    OSD_ID HOST   STATE    PG_COUNT REPLACE FORCE DRAIN_STARTED_AT
    7      ceph-2 draining 34       False   False 2021-04-23 21:35:51.215361
    11     ceph-2 draining 14       False   False 2021-04-23 21:35:50.243762
    Warning

    Do not proceed with the next step until this command returns no results.

  10. Use the command ceph orch host drain <HOST> to drain any remaining daemons.

    [ceph: root@oc0-controller-0 /]# ceph orch host drain ceph-2
  11. Use the command ceph orch host rm <HOST> to remove the host.

    [ceph: root@oc0-controller-0 /]# ceph orch host rm ceph-2
  12. End the Ceph shell session.
  13. Log out of the tripleo-admin account.
  14. Log in to the undercloud node as the stack user.
  15. Modify the ~/overcloud-baremetal-deploy.yaml in the following ways:

    • Decrease the count attribute in the roles to be scaled down.
    • Add an instances entry for each node being unprovisioned. Each entry must contain the following:

      • The name of the baremetal node.
      • The hostname assigned to that node.
      • A provisioned: false value.

        The following example would remove the node overcloud-compute-1.

        - name: Compute
          count: 1
          instances:
          - hostname: overcloud-compute-0
            name: node10
            # Removed from deployment due to disk failure
            provisioned: false
          - hostname: overcloud-compute-1
            name: node11
  16. Use the openstack overcloud node delete command to remove the node.

    openstack overcloud node delete \
    --stack overcloud \
    --baremetal-deployment ~/overcloud-baremetal-deploy.yaml
    Note

    A list of nodes to delete will be provided with a confirmation prompt before the nodes are deleted.

Note

If scaling down the Ceph cluster is temporary and the nodes removed will be restored later, the scaling up action can increment the count and set provisioned: true on nodes that were previously set provisioned: false. If the node will never reused, it can be set provisioned: false indefinitely and the scaling up action can specify a new instances entry.

+ The following file sample provides some examples of each instance.

+

- name: Compute
  count: 2
  instances:
  - hostname: overcloud-compute-0
    name: node10
    # Removed from deployment due to disk failure
    provisioned: false
  - hostname: overcloud-compute-1
    name: node11
  - hostname: overcloud-compute-2
    name: node12
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.