Chapter 12. Scaling overcloud nodes
Do not use openstack server delete
to remove nodes from the overcloud. Read the procedures defined in this section to properly remove and replace nodes.
There might be situations where you need to add or remove nodes after the creation of the overcloud. For example, you might need to add more Compute nodes to the overcloud. This situation requires updating the overcloud.
Use the following table to determine support for scaling each node type:
Node Type | Scale Up? | Scale Down? | Notes |
Controller | N | N | You can replace Controller nodes using the procedures in Chapter 13, Replacing Controller Nodes. |
Compute | Y | Y | |
Ceph Storage Nodes | Y | N | You must have at least 1 Ceph Storage node from the initial overcloud creation. |
Object Storage Nodes | Y | Y |
Ensure to leave at least 10 GB free space before scaling the overcloud. This free space accommodates image conversion and caching during the node provisioning process.
12.1. Adding nodes to the overcloud
Complete the following steps to add more nodes to the director node pool.
Procedure
Create a new JSON file (
newnodes.json
) containing the new node details to register:Copy to Clipboard Copied! Toggle word wrap Toggle overflow { "nodes":[ { "mac":[ "dd:dd:dd:dd:dd:dd" ], "cpu":"4", "memory":"6144", "disk":"40", "arch":"x86_64", "pm_type":"ipmi", "pm_user":"admin", "pm_password":"p@55w0rd!", "pm_addr":"192.168.24.207" }, { "mac":[ "ee:ee:ee:ee:ee:ee" ], "cpu":"4", "memory":"6144", "disk":"40", "arch":"x86_64", "pm_type":"ipmi", "pm_user":"admin", "pm_password":"p@55w0rd!", "pm_addr":"192.168.24.208" } ] }
{ "nodes":[ { "mac":[ "dd:dd:dd:dd:dd:dd" ], "cpu":"4", "memory":"6144", "disk":"40", "arch":"x86_64", "pm_type":"ipmi", "pm_user":"admin", "pm_password":"p@55w0rd!", "pm_addr":"192.168.24.207" }, { "mac":[ "ee:ee:ee:ee:ee:ee" ], "cpu":"4", "memory":"6144", "disk":"40", "arch":"x86_64", "pm_type":"ipmi", "pm_user":"admin", "pm_password":"p@55w0rd!", "pm_addr":"192.168.24.208" } ] }
Run the following command to register the new nodes:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/stackrc
$ source ~/stackrc (undercloud) $ openstack overcloud node import newnodes.json
After registering the new nodes, run the following commands to launch the introspection process for each new node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack baremetal node manage [NODE UUID] (undercloud) $ openstack overcloud node introspect [NODE UUID] --provide
(undercloud) $ openstack baremetal node manage [NODE UUID] (undercloud) $ openstack overcloud node introspect [NODE UUID] --provide
This process detects and benchmarks the hardware properties of the nodes.
Configure the image properties for the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack overcloud node configure [NODE UUID]
(undercloud) $ openstack overcloud node configure [NODE UUID]
12.2. Increasing node counts for roles
Complete the following steps to scale overcloud nodes for a specific role, such as a Compute node.
Procedure
Tag each new node with the role you want. For example, to tag a node with the Compute role, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack baremetal node set --property capabilities='profile:compute,boot_option:local' [NODE UUID]
(undercloud) $ openstack baremetal node set --property capabilities='profile:compute,boot_option:local' [NODE UUID]
Scaling the overcloud requires that you edit the environment file that contains your node counts and re-deploy the overcloud. For example, to scale your overcloud to 5 Compute nodes, edit the
ComputeCount
parameter:Copy to Clipboard Copied! Toggle word wrap Toggle overflow parameter_defaults: ... ComputeCount: 5 ...
parameter_defaults: ... ComputeCount: 5 ...
Rerun the deployment command with the updated file, which in this example is called
node-info.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack overcloud deploy --templates -e /home/stack/templates/node-info.yaml [OTHER_OPTIONS]
(undercloud) $ openstack overcloud deploy --templates -e /home/stack/templates/node-info.yaml [OTHER_OPTIONS]
Ensure you include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.
- Wait until the deployment operation completes.
12.3. Removing Compute nodes
There might be situations where you need to remove Compute nodes from the overcloud. For example, you might need to replace a problematic Compute node.
Before removing a Compute node from the overcloud, migrate the workload from the node to other Compute nodes. For more information, see Migrating virtual machine instances between Compute nodes.
Prerequisites
-
The Placement service package,
python3-osc-placement
, is installed on the undercloud.
Procedure
Source the overcloud configuration:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/stack/overcloudrc
$ source ~/stack/overcloudrc
Disable the Compute service on the outgoing node on the overcloud to prevent the node from scheduling new instances:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack compute service list (overcloud) $ openstack compute service set <hostname> nova-compute --disable
(overcloud) $ openstack compute service list (overcloud) $ openstack compute service set <hostname> nova-compute --disable
TipUse the
--disable-reason
option to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service at a later point.Source the undercloud configuration:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ source ~/stack/stackrc
(overcloud) $ source ~/stack/stackrc
Identify the UUID of the overcloud stack:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack stack list
(undercloud) $ openstack stack list
Identify the UUIDs of the nodes to delete:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack server list
(undercloud) $ openstack server list
Delete the nodes from the overcloud stack and update the plan accordingly:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack overcloud node delete --stack <stack_uuid> [node1_uuid] [node2_uuid] [node3_uuid]
(undercloud) $ openstack overcloud node delete --stack <stack_uuid> [node1_uuid] [node2_uuid] [node3_uuid]
Ensure the
openstack overcloud node delete
command runs to completion:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack stack list
(undercloud) $ openstack stack list
The status of the
overcloud
stack showsUPDATE_COMPLETE
when the delete operation is complete.ImportantIf you intend to redeploy the Compute service using the same host name, then you need to use the existing service records for the redeployed node. If this is the case, skip the remaining steps in this procedure, and proceed with the instructions detailed in Redeploying the Compute service using the same host name.
Remove the Compute service from the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ source ~/stack/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service delete <service-id>
(undercloud) $ source ~/stack/overcloudrc (overcloud) $ openstack compute service list (overcloud) $ openstack compute service delete <service-id>
Remove the Open vSwitch agent from the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack network agent list (overcloud) $ openstack network agent delete <openvswitch-agent-id>
(overcloud) $ openstack network agent list (overcloud) $ openstack network agent delete <openvswitch-agent-id>
Remove the deleted Compute service as a resource provider from the Placement service:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack resource provider list (overcloud) $ openstack resource provider delete <uuid>
(overcloud) $ openstack resource provider list (overcloud) $ openstack resource provider delete <uuid>
Decrease the
ComputeCount
parameter in the environment file that contains your node counts. This file is usually namednode-info.yaml
. For example, decrease the node count from five nodes to three nodes if you removed two nodes:Copy to Clipboard Copied! Toggle word wrap Toggle overflow parameter_defaults: ... ComputeCount: 3 ...
parameter_defaults: ... ComputeCount: 3 ...
Decreasing the node count ensures director provisions no new nodes when you run
openstack overcloud deploy
.
You are now free to remove the node from the overcloud and re-provision it for other purposes.
Redeploying the Compute service using the same host name
To redeploy a disabled Compute service, re-enable it once a Compute node with the same host name is up again.
Procedure
Remove the deleted Compute service as a resource provider from the Placement service:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ source ~/overcloudrc (overcloud) $ openstack resource provider list (overcloud) $ openstack resource provider delete <uuid>
(undercloud) $ source ~/overcloudrc (overcloud) $ openstack resource provider list (overcloud) $ openstack resource provider delete <uuid>
Check the status of the Compute service:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack compute service list --long ... | ID | Binary | Host | Zone | Status | State | Updated At | Disabled Reason | | 80 | nova-compute | compute-1.localdomain | nova | disabled | up | 2018-07-13T14:35:04.000000 | gets re-provisioned | ...
(overcloud) $ openstack compute service list --long ... | ID | Binary | Host | Zone | Status | State | Updated At | Disabled Reason | | 80 | nova-compute | compute-1.localdomain | nova | disabled | up | 2018-07-13T14:35:04.000000 | gets re-provisioned | ...
Once the service state of the redeployed Compute node is "up" again, re-enable the service:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (overcloud) $ openstack compute service set compute-1.localdomain nova-compute --enable
(overcloud) $ openstack compute service set compute-1.localdomain nova-compute --enable
12.4. Replacing Ceph Storage nodes
You can use the director to replace Ceph Storage nodes in a director-created cluster. You can find these instructions in the Deploying an Overcloud with Containerized Red Hat Ceph guide.
12.5. Replacing Object Storage nodes
Follow the instructions in this section to understand how to replace Object Storage nodes while maintaining the integrity of the cluster. This example involves a three-node Object Storage cluster in which the node overcloud-objectstorage-1
must be replaced. The goal of the procedure is to add one more node and then remove overcloud-objectstorage-1
, effectively replacing it.
Procedure
Increase the Object Storage count using the
ObjectStorageCount
parameter. This parameter is usually located innode-info.yaml
, which is the environment file containing your node counts:Copy to Clipboard Copied! Toggle word wrap Toggle overflow parameter_defaults: ObjectStorageCount: 4
parameter_defaults: ObjectStorageCount: 4
The
ObjectStorageCount
parameter defines the quantity of Object Storage nodes in your environment. In this situation, we scale from 3 to 4 nodes.Run the deployment command with the updated
ObjectStorageCount
parameter:Copy to Clipboard Copied! Toggle word wrap Toggle overflow source ~/stackrc
$ source ~/stackrc (undercloud) $ openstack overcloud deploy --templates -e node-info.yaml ENVIRONMENT_FILES
- After the deployment command completes, the overcloud contains an additional Object Storage node.
Replicate data to the new node. Before removing a node (in this case,
overcloud-objectstorage-1
), wait for a replication pass to finish on the new node. Check the replication pass progress in the/var/log/swift/swift.log
file. When the pass finishes, the Object Storage service should log entries similar to the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mar 29 08:49:05 localhost object-server: Object replication complete. Mar 29 08:49:11 localhost container-server: Replication run OVER Mar 29 08:49:13 localhost account-server: Replication run OVER
Mar 29 08:49:05 localhost object-server: Object replication complete. Mar 29 08:49:11 localhost container-server: Replication run OVER Mar 29 08:49:13 localhost account-server: Replication run OVER
To remove the old node from the ring, reduce the
ObjectStorageCount
parameter to the omit the old node. In this case, reduce it to 3:Copy to Clipboard Copied! Toggle word wrap Toggle overflow parameter_defaults: ObjectStorageCount: 3
parameter_defaults: ObjectStorageCount: 3
Create a new environment file named
remove-object-node.yaml
. This file identifies and removes the specified Object Storage node. The following content specifies the removal ofovercloud-objectstorage-1
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow parameter_defaults: ObjectStorageRemovalPolicies: [{'resource_list': ['1']}]
parameter_defaults: ObjectStorageRemovalPolicies: [{'resource_list': ['1']}]
Include both the
node-info.yaml
andremove-object-node.yaml
files in the deployment command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (undercloud) $ openstack overcloud deploy --templates -e node-info.yaml ENVIRONMENT_FILES -e remove-object-node.yaml
(undercloud) $ openstack overcloud deploy --templates -e node-info.yaml ENVIRONMENT_FILES -e remove-object-node.yaml
The director deletes the Object Storage node from the overcloud and updates the rest of the nodes on the overcloud to accommodate the node removal.
Make sure to include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.
12.6. Blacklisting nodes
You can exclude overcloud nodes from receiving an updated deployment. This is useful in scenarios where you aim to scale new nodes while excluding existing nodes from receiving an updated set of parameters and resources from the core Heat template collection. In other words, the blacklisted nodes are isolated from the effects of the stack operation.
Use the DeploymentServerBlacklist
parameter in an environment file to create a blacklist.
Setting the Blacklist
The DeploymentServerBlacklist
parameter is a list of server names. Write a new environment file, or add the parameter value to an existing custom environment file and pass the file to the deployment command:
parameter_defaults: DeploymentServerBlacklist: - overcloud-compute-0 - overcloud-compute-1 - overcloud-compute-2
parameter_defaults:
DeploymentServerBlacklist:
- overcloud-compute-0
- overcloud-compute-1
- overcloud-compute-2
The server names in the parameter value are the names according to OpenStack Orchestration (heat), not the actual server hostnames.
Include this environment file with your openstack overcloud deploy
command:
source ~/stackrc
$ source ~/stackrc
(undercloud) $ openstack overcloud deploy --templates \
-e server-blacklist.yaml \
[OTHER OPTIONS]
Heat blacklists any servers in the list from receiving updated Heat deployments. After the stack operation completes, any blacklisted servers remain unchanged. You can also power off or stop the os-collect-config
agents during the operation.
- Exercise caution when blacklisting nodes. Only use a blacklist if you fully understand how to apply the requested change with a blacklist in effect. It is possible to create a hung stack or configure the overcloud incorrectly using the blacklist feature. For example, if a cluster configuration changes applies to all members of a Pacemaker cluster, blacklisting a Pacemaker cluster member during this change can cause the cluster to fail.
- Do not use the blacklist during update or upgrade procedures. Those procedures have their own methods for isolating changes to particular servers. See the Upgrading Red Hat OpenStack Platform documentation for more information.
-
When you add servers to the blacklist, further changes to those nodes are not supported until you remove the server from the blacklist. This includes updates, upgrades, scale up, scale down, and node replacement. For example, when you blacklist existing Compute nodes while scaling out the overcloud with new Compute nodes, the blacklisted nodes miss the information added to
/etc/hosts
and/etc/ssh/ssh_known_hosts
. This can cause live migration to fail, depending on the destination host. The Compute nodes are updated with the information added to/etc/hosts
and/etc/ssh/ssh_known_hosts
during the next overcloud deployment where they are no longer blacklisted.
Clearing the Blacklist
To clear the blacklist for subsequent stack operations, edit the DeploymentServerBlacklist
to use an empty array:
parameter_defaults: DeploymentServerBlacklist: []
parameter_defaults:
DeploymentServerBlacklist: []
Do not just omit the DeploymentServerBlacklist
parameter. If you omit the parameter, the overcloud deployment uses the previously saved value.