Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 10. Scaling overcloud nodes


If you want to add or remove nodes after the creation of the overcloud, you must update the overcloud.

Note

Ensure that your bare metal nodes are not in maintenance mode before you begin scaling out or removing an overcloud node.

Use the following table to determine support for scaling each node type:

Expand
Table 10.1. Scale support for each node type
Node typeScale up?Scale down?Notes

Controller

N

N

You can replace Controller nodes using the procedures in Chapter 11, Replacing Controller nodes.

Compute

Y

Y

 

Ceph Storage nodes

Y

N

You must have at least 1 Ceph Storage node from the initial overcloud creation.

Object Storage nodes

Y

Y

 
Important

Ensure that you have at least 10 GB free space before you scale the overcloud. This free space accommodates image conversion and caching during the node provisioning process.

The process for scaling pre-provisioned nodes is similar to the standard scaling procedures. However, the process to add new pre-provisioned nodes differs because pre-provisioned nodes do not use the standard registration and management process from the Bare Metal Provisioning service (ironic) and the Compute service (nova).

10.1. Adding nodes to the overcloud

You can add more nodes to your overcloud.

Note

A fresh installation of Red Hat OpenStack Platform (RHOSP) does not include certain updates, such as security errata and bug fixes. As a result, if you are scaling up a connected environment that uses the Red Hat Customer Portal or Red Hat Satellite Server, RPM updates are not applied to new nodes. To apply the latest updates to the overcloud nodes, you must do one of the following:

Procedure

  1. Create a new JSON file called newnodes.json that contains details of the new node that you want to register:

    {
      "nodes":[
        {
            "mac":[
                "dd:dd:dd:dd:dd:dd"
            ],
            "cpu":"4",
            "memory":"6144",
            "disk":"40",
            "arch":"x86_64",
            "pm_type":"ipmi",
            "pm_user":"admin",
            "pm_password":"p@55w0rd!",
            "pm_addr":"192.02.24.207"
        },
        {
            "mac":[
                "ee:ee:ee:ee:ee:ee"
            ],
            "cpu":"4",
            "memory":"6144",
            "disk":"40",
            "arch":"x86_64",
            "pm_type":"ipmi",
            "pm_user":"admin",
            "pm_password":"p@55w0rd!",
            "pm_addr":"192.02.24.208"
        }
      ]
    }
    Copy to Clipboard Toggle word wrap
  2. Log in to the undercloud host as the stack user.
  3. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  4. Register the new nodes:

    $ openstack overcloud node import newnodes.json
    Copy to Clipboard Toggle word wrap
  5. Launch the introspection process for each new node:

    $ openstack overcloud node introspect \
     --provide <node_1> [<node_2>] [<node_n>]
    Copy to Clipboard Toggle word wrap
    • Use the --provide option to reset all the specified nodes to an available state after introspection.
    • Replace <node_1>, <node_2>, and all nodes up to <node_n> with the UUID of each node that you want to introspect.
  6. Configure the image properties for each new node:

    $ openstack overcloud node configure <node>
    Copy to Clipboard Toggle word wrap

10.2. Scaling up bare-metal nodes

To increase the count of bare-metal nodes in an existing overcloud, increment the node count in the overcloud-baremetal-deploy.yaml file and redeploy the overcloud.

Prerequisites

Procedure

  1. Log in to the undercloud host as the stack user.
  2. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  3. Open the overcloud-baremetal-deploy.yaml node definition file that you use to provision your bare-metal nodes.
  4. Increment the count parameter for the roles that you want to scale up. For example, the following configuration increases the Object Storage node count to 4:

    - name: Controller
      count: 3
    - name: Compute
      count: 10
    - name: ObjectStorage
      count: 4
    Copy to Clipboard Toggle word wrap
  5. Optional: Configure predictive node placement for the new nodes. For example, use the following configuration to provision a new Object Storage node on node03:

    - name: ObjectStorage
      count: 4
      instances:
      - hostname: overcloud-objectstorage-0
        name: node00
      - hostname: overcloud-objectstorage-1
        name: node01
      - hostname: overcloud-objectstorage-2
        name: node02
      - hostname: overcloud-objectstorage-3
        name: node03
    Copy to Clipboard Toggle word wrap
  6. Optional: Define any other attributes that you want to assign to your new nodes. For more information about the properties you can use to configure node attributes in your node definition file, see Bare-metal node provisioning attributes.
  7. If you use the Object Storage service (swift) and the whole disk overcloud image, overcloud-hardened-uefi-full, configure the size of the /srv partition based on the size of your disk and your storage requirements for /var and /srv. For more information, see Configuring whole disk partitions for the Object Storage service.
  8. Provision the overcloud nodes:

    $ openstack overcloud node provision \
      --stack <stack> \
      --network-config \
      --output <deployment_file> \
      /home/stack/templates/overcloud-baremetal-deploy.yaml
    Copy to Clipboard Toggle word wrap
    • Replace <stack> with the name of the stack for which the bare-metal nodes are provisioned. If not specified, the default is overcloud.
    • Include the --network-config argument to provide the network definitions to the cli-overcloud-node-network-config.yaml Ansible playbook.
    • Replace <deployment_file> with the name of the heat environment file to generate for inclusion in the deployment command, for example /home/stack/templates/overcloud-baremetal-deployed.yaml.

      Note

      If you upgraded from Red Hat OpenStack Platform 16.2 to 17.1, you must include the YAML file that you created or updated during the upgrade process in the openstack overcloud node provision command. For example, use the /home/stack/tripleo-[stack]-baremetal-deployment.yaml file instead of the /home/stack/templates/overcloud-baremetal-deployed.yaml file. For more information, see Performing the overcloud adoption and preparation in Framework for upgrades (16.2 to 17.1).

  9. Monitor the provisioning progress in a separate terminal. When provisioning is successful, the node state changes from available to active:

    $ watch openstack baremetal node list
    Copy to Clipboard Toggle word wrap
  10. Add the generated overcloud-baremetal-deployed.yaml file to the stack with your other environment files and deploy the overcloud:

    $ openstack overcloud deploy --templates \
      -e [your environment files] \
      -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
      --disable-validations \
      ...
    Copy to Clipboard Toggle word wrap

10.3. Scaling down bare-metal nodes

To scale down the number of bare-metal nodes in your overcloud, tag the nodes that you want to delete from the stack in the node definition file, redeploy the overcloud, and then delete the bare-metal node from the overcloud.

Prerequisites

  • A successful undercloud installation. For more information, see Installing director on the undercloud.
  • A successful overcloud deployment. For more information, see Configuring a basic overcloud with pre-provisioned nodes.
  • If you are replacing an Object Storage node, replicate data from the node you are removing to the new replacement node. Wait for a replication pass to finish on the new node. Check the replication pass progress in the /var/log/swift/swift.log file. When the pass finishes, the Object Storage service (swift) adds entries to the log similar to the following example:

    Mar 29 08:49:05 localhost object-server: Object replication complete.
    Mar 29 08:49:11 localhost container-server: Replication run OVER
    Mar 29 08:49:13 localhost account-server: Replication run OVER
    Copy to Clipboard Toggle word wrap

Procedure

  1. Log in to the undercloud host as the stack user.
  2. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  3. Decrement the count parameter in the overcloud-baremetal-deploy.yaml file, for the roles that you want to scale down.
  4. Define the hostname and name of each node that you want to remove from the stack, if they are not already defined in the instances attribute for the role.
  5. Add the attribute provisioned: false to the node that you want to remove. For example, to remove the node overcloud-objectstorage-1 from the stack, include the following snippet in your overcloud-baremetal-deploy.yaml file:

    - name: ObjectStorage
      count: 3
      instances:
      - hostname: overcloud-objectstorage-0
        name: node00
      - hostname: overcloud-objectstorage-1
        name: node01
        # Removed from cluster due to disk failure
        provisioned: false
      - hostname: overcloud-objectstorage-2
        name: node02
      - hostname: overcloud-objectstorage-3
        name: node03
    Copy to Clipboard Toggle word wrap

    After you redeploy the overcloud, the nodes that you define with the provisioned: false attribute are no longer present in the stack. However, these nodes are still running in a provisioned state.

    Note

    If you want to remove a node from the stack temporarily, after you deploy the overcloud with the attribute provisioned: false, you can then redeploy the overcloud with the attribute provisioned: true to return the node to the stack.

  6. Delete the node from the overcloud:

    $ openstack overcloud node delete \
      --stack <stack> \
      --baremetal-deployment \
       /home/stack/templates/overcloud-baremetal-deploy.yaml
    Copy to Clipboard Toggle word wrap
    • Replace <stack> with the name of the stack for which the bare-metal nodes are provisioned. If not specified, the default is overcloud.

      Note

      Do not include the nodes that you want to remove from the stack as command arguments in the openstack overcloud node delete command.

  7. Delete the ironic node:

    $ openstack baremetal node delete <ironic_node_uuid>
    Copy to Clipboard Toggle word wrap

    Replace <ironic_node_uuid> with the UUID of the node.

  8. Delete the network agents for the node that you deleted:

    (overcloud)$ for AGENT in $(openstack network agent list \
      --host <ironic_node_uuid> -c ID -f value) ; \
      do openstack network agent delete $AGENT ; done
    Copy to Clipboard Toggle word wrap
  9. Provision the overcloud nodes to generate an updated heat environment file for inclusion in the deployment command:

    $ openstack overcloud node provision \
      --stack <stack> \
      --output <deployment_file> \
      /home/stack/templates/overcloud-baremetal-deploy.yaml
    Copy to Clipboard Toggle word wrap
    • Replace <deployment_file> with the name of the heat environment file to generate for inclusion in the deployment command, for example /home/stack/templates/overcloud-baremetal-deployed.yaml.
  10. Add the overcloud-baremetal-deployed.yaml file generated by the provisioning command to the stack with your other environment files, and deploy the overcloud:

    $ openstack overcloud deploy \
      ...
      -e /usr/share/openstack-tripleo-heat-templates/environments \
      -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
      --disable-validations \
      ...
    Copy to Clipboard Toggle word wrap

10.3.1. Removing a Compute node manually

If the openstack overcloud node delete command fails due to an unreachable node, then you must manually complete the removal of the Compute node from the overcloud.

Prerequisites

Procedure

  1. Source the undercloud configuration:

    (overcloud)$ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  2. Use the openstack tripleo launch heat command to launch the ephemeral heat process:

    (undercloud)$ openstack tripleo launch heat --heat-dir /home/stack/overcloud-deploy/overcloud/heat-launcher --restore-db
    Copy to Clipboard Toggle word wrap

    The command exits after launching the heat process. The heat process continues to run in the background as a podman pod.

  3. Use the podman pod ps command to verify that the ephemeral-heat process is running:

    (undercloud)$ sudo podman pod ps
    POD ID        NAME            STATUS      CREATED        INFRA ID      # OF CONTAINERS
    958b141609b2  ephemeral-heat  Running     2 minutes ago  44447995dbcf  3
    Copy to Clipboard Toggle word wrap
  4. Use the export command to export the OS_CLOUD environment:

    (undercloud)$ export OS_CLOUD=heat
    Copy to Clipboard Toggle word wrap
  5. Use the openstack stack list command to list the installed stacks:

    (undercloud)$ openstack stack list
    +--------------------------------------+------------+---------+-----------------+----------------------+--------------+
    | ID                                   | Stack Name | Project | Stack Status    | Creation Time        | Updated Time |
    +--------------------------------------+------------+---------+-----------------+----------------------+--------------+
    | 761e2a54-c6f9-4e0f-abe6-c8e0ad51a76c | overcloud  | admin   | CREATE_COMPLETE | 2022-08-29T20:48:37Z | None         |
    +--------------------------------------+------------+---------+-----------------+----------------------+--------------+
    Copy to Clipboard Toggle word wrap
  6. Use the export command to export the OS_CLOUD environment:

    (undercloud)$ export OS_CLOUD=undercloud
    Copy to Clipboard Toggle word wrap
  7. Identify the UUID of the node that you want to manually delete:

    (undercloud)$ openstack baremetal node list
    Copy to Clipboard Toggle word wrap
  8. Move the node that you want to delete to maintenance mode:

    (undercloud)$ openstack baremetal node maintenance set <node_uuid>
    Copy to Clipboard Toggle word wrap
  9. Wait for the Compute service to synchronize its state with the Bare Metal service. This can take up to four minutes.
  10. Source the overcloud configuration:

    (undercloud)$ source ~/overcloudrc
    Copy to Clipboard Toggle word wrap
  11. Delete the network agents for the node that you deleted:

    (overcloud)$ for AGENT in $(openstack network agent list --host <scaled_down_node> -c ID -f value) ; do openstack network agent delete $AGENT ; done
    Copy to Clipboard Toggle word wrap
    • Replace <scaled_down_node> with the name of the node to remove.
  12. Confirm that the Compute service is disabled on the deleted node on the overcloud, to prevent the node from scheduling new instances:

    (overcloud)$ openstack compute service list
    Copy to Clipboard Toggle word wrap
  13. If the Compute service is not disabled, disable it:

    (overcloud)$ openstack compute service set <hostname> nova-compute --disable
    Copy to Clipboard Toggle word wrap
  14. Remove the deleted Compute service as a resource provider from the Placement service:

    (overcloud)$ openstack resource provider list
    (overcloud)$ openstack resource provider delete <uuid>
    Copy to Clipboard Toggle word wrap
  15. Log in as the root user on the Compute node that you want to delete.
  16. Delete the System Profile of the system that is registered with Red Hat Subscription Management:

    # subscription-manager remove --all
    # subscription-manager unregister
    # subscription-manager clean
    Copy to Clipboard Toggle word wrap
    Note

    If you cannot reach the Compute node, you can delete the System Profile on the Red Hat Customer Portal. For more information, see How to delete System Profiles of the systems registered with Red Hat Subscription Management (RHSM)?.

  17. Source the undercloud configuration:

    (overcloud)$ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  18. Delete the Compute node from the stack:

    (undercloud)$ openstack overcloud node delete --stack <overcloud> <node> --baremetal-deployment <baremetal_deployment_file>
    Copy to Clipboard Toggle word wrap
    • Replace <overcloud> with the name or UUID of the overcloud stack.
    • Replace <node> with the Compute service host name or UUID of the Compute node that you want to delete.
    • Replace <baremetal_deployment_file> with the name of the bare metal deployment file.

      Note

      If the node has already been powered off, this command returns a WARNING message:

      Ansible failed, check log at `~/ansible.log`
      WARNING: Scale-down configuration error. Manual cleanup of some actions may be necessary. Continuing with node removal.
      Copy to Clipboard Toggle word wrap

      You can ignore this message.

  19. Wait for the overcloud node to delete.
  20. Use the export command to export the OS_CLOUD environment:

    (undercloud)$ export OS_CLOUD=heat
    Copy to Clipboard Toggle word wrap
  21. Check the status of the overcloud stack when the node deletion is complete:

    (undercloud)$ openstack stack list
    Copy to Clipboard Toggle word wrap
    Expand
    Table 10.2. Result
    StatusDescription

    UPDATE_COMPLETE

    The delete operation completed successfully.

    UPDATE_FAILED

    The delete operation failed.

    If the overcloud node fails to delete while in maintenance mode, then the problem might be with the hardware.

  22. If Instance HA is enabled, perform the following actions:

    1. Clean up the Pacemaker resources for the node:

      $ sudo pcs resource delete <scaled_down_node>
      $ sudo cibadmin -o nodes --delete --xml-text '<node id="<scaled_down_node>"/>'
      $ sudo cibadmin -o fencing-topology --delete --xml-text '<fencing-level target="<scaled_down_node>"/>'
      $ sudo cibadmin -o status --delete --xml-text '<node_state id="<scaled_down_node>"/>'
      $ sudo cibadmin -o status --delete-all --xml-text '<node id="<scaled_down_node>"/>' --force
      Copy to Clipboard Toggle word wrap
    2. Delete the STONITH device for the node:

      $ sudo pcs stonith delete <device-name>
      Copy to Clipboard Toggle word wrap
  23. If you are not replacing the removed Compute node on the overcloud, then decrease the ComputeCount parameter in the environment file that contains your node counts. This file is usually named overcloud-baremetal-deploy.yaml. For example, decrease the node count from four nodes to three nodes if you removed one node:

    parameter_defaults:
      ...
      ComputeCount: 3
      ...
    Copy to Clipboard Toggle word wrap

    Decreasing the node count ensures that director does not provision any new nodes when you run openstack overcloud deploy.

    Note

    To replace a Compute node after you remove it from your deployment, see Scaling up bare-metal nodes.

10.4. Scaling up pre-provisioned nodes

When scaling up the overcloud with pre-provisioned nodes, you must configure the orchestration agent on each node to correspond to the director node count.

Procedure

  1. Prepare the new pre-provisioned nodes. For more information, see Pre-provisioned node requirements.
  2. Scale up the nodes. For more information, see Scaling overcloud nodes.

10.5. Scaling down pre-provisioned nodes

When scaling down an overcloud that has pre-provisioned nodes, follow the scale down instructions in Scaling overcloud nodes.

In scale-down operations, you can use host names for both Red Hat OpenStack Platform (RHOSP) provisioned or pre-provisioned nodes. You can also use the UUID for RHOSP provisioned nodes. However, there is no UUID for pre-provisoned nodes, so you always use the host name.

Procedure

  1. Retrieve the names of the nodes that you want to remove:

    $ openstack stack resource list overcloud -n5 --filter type=OS::TripleO::ComputeDeployedServerServer
    Copy to Clipboard Toggle word wrap
  2. Delete the nodes:

    $ openstack overcloud node delete --stack <overcloud> <node> [... <node>]
    Copy to Clipboard Toggle word wrap
    • Replace <overcloud> with the name or UUID of the overcloud stack.
    • Replace <node> with the host names of the nodes that you want to remove, retrieved from the stack_name column returned in step 1.
  3. Ensure that the node is deleted:

    $ openstack stack list
    Copy to Clipboard Toggle word wrap

    The status of the overcloud stack shows UPDATE_COMPLETE when the delete operation is complete.

  4. Power off the removed nodes. In a standard deployment, the bare-metal services on director power off the removed nodes. With pre-provisioned nodes, you must either manually shut down the removed nodes or use the power management control for each physical system. If you do not power off the nodes after removing them from the stack, they might remain operational and reconnect as part of the overcloud environment.
  5. Re-provision the removed nodes to a base operating system configuration so that they do not unintentionally join the overcloud in the future.

    Note

    Do not attempt to reuse nodes previously removed from the overcloud without first reprovisioning them with a fresh base operating system. The scale down process only removes the node from the overcloud stack and does not uninstall any packages.

10.6. Replacing Red Hat Ceph Storage nodes

You can use director to replace Red Hat Ceph Storage nodes in a director-created cluster. For more information, see the Deploying Red Hat Ceph Storage and Red Hat OpenStack Platform together with director guide.

10.7. Using skip deploy identifier

During a stack update operation puppet, by default, reapplies all manifests. This can result in a time consuming operation, which may not be required.

To override the default operation, use the skip-deploy-identifier option.

openstack overcloud deploy --skip-deploy-identifier
Copy to Clipboard Toggle word wrap

Use this option if you do not want the deployment command to generate a unique identifier for the DeployIdentifier parameter. The software configuration deployment steps only trigger if there is an actual change to the configuration. Use this option with caution and only if you are confident that you do not need to run the software configuration, such as scaling out certain roles.

Note

If there is a change to the puppet manifest or hierdata, puppet will reapply all manifests even when --skip-deploy-identifier is specified.

10.8. Blacklisting nodes

You can exclude overcloud nodes from receiving an updated deployment. This is useful in scenarios where you want to scale new nodes and exclude existing nodes from receiving an updated set of parameters and resources from the core heat template collection. This means that the blacklisted nodes are isolated from the effects of the stack operation.

Use the DeploymentServerBlacklist parameter in an environment file to create a blacklist.

Setting the blacklist

The DeploymentServerBlacklist parameter is a list of server names. Write a new environment file, or add the parameter value to an existing custom environment file and pass the file to the deployment command:

parameter_defaults:
  DeploymentServerBlacklist:
    - overcloud-compute-0
    - overcloud-compute-1
    - overcloud-compute-2
Copy to Clipboard Toggle word wrap
Note

The server names in the parameter value are the names according to OpenStack Orchestration (heat), not the actual server hostnames.

Include this environment file with your openstack overcloud deploy command:

$ source ~/stackrc
(undercloud) $ openstack overcloud deploy --templates \
  -e server-blacklist.yaml \
  [OTHER OPTIONS]
Copy to Clipboard Toggle word wrap

Heat blacklists any servers in the list from receiving updated heat deployments. After the stack operation completes, any blacklisted servers remain unchanged. You can also power off or stop the os-collect-config agents during the operation.

Warning
  • Exercise caution when you blacklist nodes. Only use a blacklist if you fully understand how to apply the requested change with a blacklist in effect. It is possible to create a hung stack or configure the overcloud incorrectly when you use the blacklist feature. For example, if cluster configuration changes apply to all members of a Pacemaker cluster, blacklisting a Pacemaker cluster member during this change can cause the cluster to fail.
  • Do not use the blacklist during update or upgrade procedures. Those procedures have their own methods for isolating changes to particular servers.
  • When you add servers to the blacklist, further changes to those nodes are not supported until you remove the server from the blacklist. This includes updates, upgrades, scale up, scale down, and node replacement. For example, when you blacklist existing Compute nodes while scaling out the overcloud with new Compute nodes, the blacklisted nodes miss the information added to /etc/hosts and /etc/ssh/ssh_known_hosts. This can cause live migration to fail, depending on the destination host. The Compute nodes are updated with the information added to /etc/hosts and /etc/ssh/ssh_known_hosts during the next overcloud deployment where they are no longer blacklisted. Do not modify the /etc/hosts and /etc/ssh/ssh_known_hosts files manually. To modify the /etc/hosts and /etc/ssh/ssh_known_hosts files, run the overcloud deploy command as described in the Clearing the Blacklist section.

Clearing the blacklist

To clear the blacklist for subsequent stack operations, edit the DeploymentServerBlacklist to use an empty array:

parameter_defaults:
  DeploymentServerBlacklist: []
Copy to Clipboard Toggle word wrap
Warning

Do not omit the DeploymentServerBlacklist parameter. If you omit the parameter, the overcloud deployment uses the previously saved value.

Retour au début
Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2025 Red Hat