Chapter 2. Scaling data plane nodes
You can scale out your data plane by adding new nodes to existing node sets and by adding new node sets. You can scale in your data plane by removing nodes from node sets and by removing node sets.
2.1. Adding nodes to a node set Copy linkLink copied to clipboard!
You can scale out your data plane by adding new nodes to the nodes section of an existing OpenStackDataPlaneNodeSet custom resource (CR).
Prerequisites
-
If you are adding unprovisioned nodes, then a
BareMetalHostCR must be registered and inspected for each bare-metal data plane node. Each bare-metal node must be in theAvailablestate after inspection.
Procedure
-
Open the
OpenStackDataPlaneNodeSetCR definition file for the node set you want to update, for example,openstack_data_plane.yaml. Add the new node to the node set:
Pre-Provisioned:
apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneNodeSet metadata: name: openstack-node-set spec: preProvisioned: True nodes: ... edpm-compute-2: hostName: edpm-compute-2 ansible: ansibleHost: 192.168.122.102 networks: - name: ctlplane subnetName: subnet1 defaultRoute: true fixedIP: 192.168.122.102 - name: internalapi subnetName: subnet1 - name: storage subnetName: subnet1 - name: tenant subnetName: subnet1 ...Unprovisioned:
apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneNodeSet metadata: name: openstack-node-set spec: preProvisioned: False nodes: ... edpm-compute-2: hostName: edpm-compute-2 ...
For information about the properties you can use to configure common node attributes, see OpenStackDataPlaneNodeSet CR spec properties in Deploying Red Hat OpenStack Services on OpenShift.
-
Save the
OpenStackDataPlaneNodeSetCR definition file. Apply the updated
OpenStackDataPlaneNodeSetCR configuration:$ oc apply -f openstack_data_plane.yamlVerify that the data plane resource has been updated by confirming that the status is
SetupReady:$ oc wait openstackdataplanenodeset openstack-node-set --for condition=SetupReady --timeout=10mWhen the status is
SetupReady, the command returns acondition metmessage, otherwise it returns a timeout error. For information about the data plane conditions and states, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.Create a file on your workstation to define the
OpenStackDataPlaneDeploymentCR:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>-
Replace
<node_set_deployment_name>with the name of theOpenStackDataPlaneDeploymentCR. The name must be unique, must consist of lower case alphanumeric characters,-(hyphen) or.(period), and must start and end with an alphanumeric character.
TipGive the definition file and the
OpenStackDataPlaneDeploymentCR unique and descriptive names that indicate the purpose of the modified node set.-
Replace
Add the
OpenStackDataPlaneNodeSetCR that you modified:spec: nodeSets: - <nodeSet_name>-
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the modified
OpenStackDataPlaneNodeSetCR:$ oc create -f openstack_data_plane_deploy.yaml -n openstackYou can view the Ansible logs while the deployment executes:
$ oc get pod -l app=openstackansibleee -w $ oc logs -l app=openstackansibleee -f --max-log-requests 10If the
oc logscommand returns an error similar to the following error, increase the--max-log-requestsvalue:error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limitVerify that the modified
OpenStackDataPlaneNodeSetCR is deployed:$ oc get openstackdataplanedeployment -n openstack NAME NODESETS STATUS MESSAGE openstack-data-plane ["openstack-data-plane"] True Setup Complete $ oc get openstackdataplanenodeset -n openstack NAME STATUS MESSAGE openstack-data-plane True NodeSet ReadyFor information about the meaning of the returned status, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.
If the status indicates that the data plane has not been deployed, then troubleshoot the deployment. For information, see Troubleshooting the data plane creation and deployment in Deploying Red Hat OpenStack Services on OpenShift.
If the new nodes are Compute nodes, you must bring them online:
Map the Compute nodes to the Compute cell that they are connected to:
$ oc rsh nova-cell0-conductor-0 nova-manage cell_v2 discover_hosts --verboseIf you did not create additional cells, this command maps the Compute nodes to
cell1.Verify that the hypervisor hostname is a fully qualified domain name (FQDN):
$ hostname -fIf the hypervisor hostname is not an FQDN, for example, if it was registered as a short name or full name instead, contact Red Hat Support.
Access the remote shell for the
openstackclientpod and verify that the deployed Compute nodes are visible on the control plane:$ oc rsh -n openstack openstackclient $ openstack hypervisor list
2.2. Adding a new node set to the data plane Copy linkLink copied to clipboard!
You can scale out your data plane by adding a new OpenStackDataPlaneNodeSet CR to the data plane. To add the new node set to an existing data plane, you must create a new OpenStackDataPlaneDeployment CR that deploys the new OpenStackDataPlaneNodeSet CR. If you want to be able to perform move operations, such as instance migration and resize, between your new node set and other node sets on your data plane, then you must also create an additional OpenStackDataPlaneDeployment CR that runs the ssh-known-hosts service on all the node sets between which move operations must be able to be performed.
Procedure
-
Create a file on your workstation to define the new
OpenStackDataPlaneNodeSetCR. Define the node set. For details about how to create a node set, see one of the following procedures:
Create a file on your workstation to define the
OpenStackDataPlaneDeploymentCR to deploy the new node set:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>-
Replace
<node_set_deployment_name>with the name of theOpenStackDataPlaneDeploymentCR. The name must be unique, must consist of lower case alphanumeric characters,-(hyphen) or.(period), and must start and end with an alphanumeric character.
TipGive the definition file and the
OpenStackDataPlaneDeploymentCR unique and descriptive names that indicate the purpose of the new node set.-
Replace
Add your new
OpenStackDataPlaneNodeSetCR to the list of node sets to deploy:spec: nodeSets: - <nodeSet_name>-
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the new
OpenStackDataPlaneNodeSetCR:$ oc create -f openstack_data_plane_deploy.yaml -n openstackYou can view the Ansible logs while the deployment executes:
$ oc get pod -l app=openstackansibleee -w $ oc logs -l app=openstackansibleee -f --max-log-requests 10If the
oc logscommand returns an error similar to the following error, increase the--max-log-requestsvalue:error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limitVerify that the new
OpenStackDataPlaneNodeSetCR is deployed:$ oc get openstackdataplanedeployment -n openstack NAME NODESETS STATUS MESSAGE openstack-data-plane ["openstack-data-plane"] True Setup Complete $ oc get openstackdataplanenodeset -n openstack NAME STATUS MESSAGE openstack-data-plane True NodeSet ReadyFor information about the meaning of the returned status, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.
If the status indicates that the data plane has not been deployed, then troubleshoot the deployment. For information, see Troubleshooting the data plane creation and deployment in Deploying Red Hat OpenStack Services on OpenShift.
If you want to be able to migrate workloads between your new node set and other node sets on your data plane, or perform resize operations, then you must create an additional
OpenStackDataPlaneDeploymentCR that runs thessh-known-hostsservice on all the node sets that the move operations are expected to work between:Create a file on your workstation to define an
OpenStackDataPlaneDeploymentCR that enables move operations:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: enable-move-operationsAdd the new
OpenStackDataPlaneNodeSetCR to the list of node sets and all the existingOpenStackDataPlaneNodeSetCRs that move operations must be able to be performed between:spec: nodeSets: - enable-move-operations - ... - <nodeSet_name>Specify that only the
ssh-known-hostsservice is executed on the specified node sets when deploying the node sets in the move operationsOpenStackDataPlaneDeploymentCR:spec: ... servicesOverride: - ssh-known-hosts-
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the
ssh-known-hostsservice to enable move operations between the new node set and the other specified node sets on the data plane:$ oc create -f enable_move_operations.yaml -n openstack
If the new nodes are Compute nodes, you must bring them online:
Map the Compute nodes to the Compute cell that they are connected to:
$ oc rsh nova-cell0-conductor-0 nova-manage cell_v2 discover_hosts --verboseIf you did not create additional cells, this command maps the Compute nodes to
cell1.Verify that the hypervisor hostname is a fully qualified domain name (FQDN):
$ hostname -fIf the hypervisor hostname is not an FQDN, for example, if it was registered as a short name or full name instead, contact Red Hat Support.
Access the remote shell for the
openstackclientpod and verify that the deployed Compute nodes are visible on the control plane:$ oc rsh -n openstack openstackclient $ openstack hypervisor list
2.3. Removing a Compute node from the data plane Copy linkLink copied to clipboard!
You can remove a Compute node from a node set on the data plane. If you remove all the nodes from a node set, then you must also remove the node set from the data plane.
Prerequisites
-
You are logged in to the RHOCP cluster as a user with
cluster-adminprivileges. - The workloads on the Compute nodes have been migrated to other Compute nodes.
Procedure
Access the remote shell for the
openstackclientpod:$ oc rsh -n openstack openstackclientRetrieve the IP address of the Compute node that you want to remove:
$ openstack hypervisor listRetrieve a list of your Compute nodes to identify the name and UUID of the node that you want to remove:
$ openstack compute service listDisable the
nova-computeservice on the Compute node to be removed:$ openstack compute service set <hostname> nova-compute --disableTipUse the
--disable-reasonoption to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service.Exit the
OpenStackClientpod:$ exitSSH into the Compute node to be removed and stop the
ovnandnova-computecontainers:$ ssh -i <key_file_name> cloud-admin@<node_IP_address> [cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_ovn_controller [cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_ovn_metadata_agent [cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_nova_compute-
Replace
<key_file_name>with the name and location of the SSH key pair file you created to enable Ansible to manage the RHEL nodes. -
Replace
<node_IP_address>with the IP address for the Compute node that you retrieved in step 2.
-
Replace
Remove the
systemdunit files that manage theovnandnova-computecontainers to prevent the agents from being automatically started and registered in the database if the removed node is rebooted:[cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_ovn_controller [cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_ovn_metadata_agent [cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_nova_computeDisconnect from the Compute node:
$ exitAccess the remote shell for
openstackclient:$ oc rsh -n openstack openstackclientDelete the network agents for the Compute node to be removed:
$ openstack network agent list [--host <hostname>] $ openstack network agent delete <agent_id>Delete the
nova-computeservice for the Compute node to be removed:$ openstack compute service delete <node_uuid>-
Replace
<node_uuid>with the UUID of the node to be removed that you retrieved in step 3.
-
Replace
Exit the
OpenStackClientpod:$ exitRemove the node from the
OpenStackDataPlaneNodeSetCR:$ oc patch openstackdataplanenodeset/<node_set_name> --type json --patch '[{ "op": "remove", "path": "/spec/nodes/<node_name>" }]'-
Replace
<node_set_name>with the name of theOpenStackDataPlaneNodeSetCR that the node belongs to. -
Replace
<node_name>with the name of the node defined in thenodessection of theOpenStackDataPlaneNodeSetCR.
-
Replace
Create a file on your workstation to define the
OpenStackDataPlaneDeploymentCR to update the node set with the Compute node removed:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>-
Replace
<node_set_deployment_name>with the name of theOpenStackDataPlaneDeploymentCR. The name must be unique, must consist of lower case alphanumeric characters,-(hyphen) or.(period), and must start and end with an alphanumeric character.
TipGive the definition file and the
OpenStackDataPlaneDeploymentCR unique and descriptive names that indicate the purpose of the modified node set.-
Replace
Add the
OpenStackDataPlaneNodeSetCR that you removed the node from:spec: nodeSets: - <nodeSet_name>-
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the
OpenStackDataPlaneDeploymentCR to delete the removed nodes:$ oc create -f openstack_data_plane_deploy.yaml -n openstackYou can view the Ansible logs while the deployment executes:
$ oc get pod -l app=openstackansibleee -w $ oc logs -l app=openstackansibleee -f --max-log-requests 10If the
oc logscommand returns an error similar to the following error, increase the--max-log-requestsvalue:error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limitVerify that the modified
OpenStackDataPlaneNodeSetCR is deployed:$ oc get openstackdataplanedeployment -n openstack NAME NODESETS STATUS MESSAGE openstack-data-plane ["openstack-data-plane"] True Setup Complete $ oc get openstackdataplanenodeset -n openstack NAME STATUS MESSAGE openstack-data-plane True NodeSet ReadyFor information about the meaning of the returned status, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.
If the status indicates that the data plane has not been deployed, then troubleshoot the deployment. For information, see Troubleshooting the data plane creation and deployment in Deploying Red Hat OpenStack Services on OpenShift.
2.4. Removing an OpenStackDataPlaneNodeSet resource Copy linkLink copied to clipboard!
You can remove a whole node set from the data plane. To remove an OpenStackDataPlaneNodeSet resource, you must perform the following tasks:
-
Stop the
ovnandnova-computecontainers running on each Compute node in the node set. -
Disable and delete the
nova-computeservice from each Compute node in the node set. - Delete the network agent from each Compute node in the node set.
- Remove the SSH host keys of the removed nodes from the nodes in the remaining node sets.
- Delete the node set and remove the node set from the data plane.
Prerequisites
-
You are logged in to the RHOCP cluster as a user with
cluster-adminprivileges. - The workloads on the node set Compute nodes have been migrated to Compute nodes on another node set.
Procedure
Access the remote shell for the
openstackclientpod:$ oc rsh -n openstack openstackclientRetrieve the IP address of each Compute node you want to remove:
$ openstack hypervisor listRetrieve a list of your Compute nodes to identify the name and UUID of each node that you want to remove:
$ openstack compute service listDisable the
nova-computeservice on each Compute node to be removed:$ openstack compute service set <hostname> nova-compute --disableTipUse the
--disable-reasonoption to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service.Exit the
OpenStackClientpod:$ exitPerform the following operations on each Compute node to be removed:
SSH into the Compute node to be removed:
$ ssh -i <key_file_name> cloud-admin@<node_IP_address>-
Replace
<key_file_name>with the name and location of the SSH key pair file you created to enable Ansible to manage the RHEL nodes. -
Replace
<node_IP_address>with the IP address for the Compute node that you retrieved in step 2.
-
Replace
Stop the
ovnandnova-computecontainers:[cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_ovn_controller [cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_ovn_metadata_agent [cloud-admin@<hostname> ~]$ sudo systemctl stop edpm_nova_computeRemove the
systemdunit files that manage theovnandnova-computecontainers to prevent the agents from being automatically started and registered in the database if the removed node is rebooted:[cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_ovn_controller [cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_ovn_metadata_agent [cloud-admin@<hostname> ~]$ sudo rm -f /etc/systemd/system/edpm_nova_computeDisconnect from the Compute node:
$ exit
Access the remote shell for
openstackclient:$ oc rsh -n openstack openstackclientDelete the network agents for each Compute node in the node set:
$ openstack network agent list [--host <hostname>] $ openstack network agent delete <agent_id>Delete the
nova-computeservice for each Compute node in the node set:$ openstack compute service delete <node_uuid>-
Replace
<node_uuid>with the UUID of the node to be removed that you retrieved in step 3.
-
Replace
Exit the
openstackclientpod:$ exitSearch for the string
secretHashesin the output of the following command for the secrets in the node set to be deleted:$ oc get openstackdataplanenodeset <node_set_name> -n openstack -o yamlThe
secretHashesfield lists all the node set secrets in key-value pair format:<key>:<value>. The following example illustrates thesecretHashesformat in YAML output:secretHashes: cert-libvirt-default-compute-4drna21w-0: n68chbfh678h5dfhcfh576h546h566h5c4h5cdh679hffh67h79h98h56fhc4h588h58fhb4h548h59fh554h54fh5cdh646h577hffhbdh569h5f9h68bq cert-libvirt-default-compute-4drna21w-1: n68dhdbh65chdbh5f7h695h7h54chcdh654h59ch564h5d6hdch66h54bh66ch556h649h666h76h55hc7h564h65dh5fch5c7h5fbh8bh55hcbh5b5qDelete the node set:
$ oc delete openstackdataplanenodeset/<node_set_name> -n openstack-
Replace
<node_set_name>with the name of theOpenStackDataPlaneNodeSetCR to be deleted.
-
Replace
Delete the node set secrets:
$ oc delete secret <secret_name>-
Replace
<secret_name>with the key of thesecretHasheskey-value pair you retrieved in the previous step, for example,cert-libvirt-default-compute-4drna21w-0.
NoteYou can ensure that secrets created by
cert-managerget removed automatically by setting the--enable-certificate-owner-refflag for the cert-manager Operator for Red Hat OpenShift. For more information, see Deleting a TLS secret automatically upon Certificate removal.-
Replace
If the node set you removed listed the global
ssh-known-hostsservice, then you must add thessh-known-hostsservice to one of the remainingOpenStackDataPlaneNodeSetCRs listed in theOpenStackDataPlaneDeploymentCR. Open the definition file for one of the remainingOpenStackDataPlaneNodeSetCRs from your workstation and add thessh-known-hostsservice to theservicesfield in the order that it should be executed relative to the other services:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneNodeSet metadata: name: <node_set_name> spec: services: - download-cache - bootstrap - configure-network - validate-network - install-os - configure-os - ssh-known-hosts - run-os - libvirt - nova - ovn - neutron-metadata - telemetryNoteWhen adding the
ssh-known-hostsservice to the services list in a node set definition, you must include all the required services, including the default services. If you include only thessh-known-hostsservice in the services list, then that is the only service that is deployed.-
Save the updated
OpenStackDataPlaneNodeSetCR definition file. Apply the updated
OpenStackDataPlaneNodeSetCR configuration:$ oc apply -f <node_set_name>.yamlCreate a file on your workstation to define the
OpenStackDataPlaneDeploymentCR that removes the node set from the data plane:apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: <node_set_deployment_name>-
Replace
<node_set_deployment_name>with the name of theOpenStackDataPlaneDeploymentCR. The name must be unique, must consist of lower case alphanumeric characters,-(hyphen) or.(period), and must start and end with an alphanumeric character.
TipGive the definition file and the
OpenStackDataPlaneDeploymentCR unique and descriptive names that indicate the purpose of the modified node set.-
Replace
Add the remaining
OpenStackDataPlaneNodeSetCRs in the data plane to the list of node sets to deploy:spec: nodeSets: - <nodeSet_name>Specify that the
OpenStackDataPlaneDeploymentCR should only run thessh-known-hostsservice when deploying the listed node sets:spec: ... servicesOverride: - ssh-known-hosts-
Save the
OpenStackDataPlaneDeploymentCR deployment file. Deploy the
ssh-known-hostsservice to delete the removed nodes from the known hosts lists on the remaining nodes:$ oc create -f openstack_data_plane_deploy.yaml -n openstackYou can view the Ansible logs while the deployment executes:
$ oc get pod -l app=openstackansibleee -w $ oc logs -l app=openstackansibleee -f --max-log-requests 10If the
oc logscommand returns an error similar to the following error, increase the--max-log-requestsvalue:error: you are attempting to follow 19 log streams, but maximum allowed concurrency is 10, use --max-log-requests to increase the limitVerify that the modified
OpenStackDataPlaneNodeSetCR is deployed:$ oc get openstackdataplanedeployment -n openstack NAME NODESETS STATUS MESSAGE openstack-data-plane ["openstack-data-plane"] True Setup Complete $ oc get openstackdataplanenodeset -n openstack NAME STATUS MESSAGE openstack-data-plane True NodeSet ReadyFor information about the meaning of the returned status, see Data plane conditions and states in Deploying Red Hat OpenStack Services on OpenShift.
If the status indicates that the data plane has not been deployed, then troubleshoot the deployment. For information, see Troubleshooting the data plane creation and deployment in Deploying Red Hat OpenStack Services on OpenShift.