Planning a large-scale RHOSO deployment
Hardware requirements and recommendations for large deployments
Abstract
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
We appreciate your input on our documentation. Tell us how we can make it better.
Use the Create Issue form to provide feedback on the documentation for Red Hat OpenStack Services on OpenShift (RHOSO) or earlier releases of Red Hat OpenStack Platform (RHOSP). When you create an issue for RHOSO or RHOSP documents, the issue is recorded in the RHOSO Jira project, where you can track the progress of your feedback.
To complete the Create Issue form, ensure that you are logged in to Jira. If you do not have a Red Hat Jira account, you can create an account at https://issues.redhat.com.
- Click the following link to open a Create Issue page: Create Issue
- Complete the Summary and Description fields. In the Description field, include the documentation URL, chapter or section number, and a detailed description of the issue. Do not modify any other fields in the form.
- Click Create.
Chapter 1. Tested specifications for large-scale RHOSO deployment Copy linkLink copied to clipboard!
Deploy Red Hat OpenStack Services on OpenShift (RHOSO) on a Red Hat OpenShift Container Platform (RHOSCP) cluster.
RHOSO consists of a control plane and a data plane. The control plane is the deployed services that are installed on RHOCP. The data plane is one or more node sets. The node sets are deployed by the control plane.
A node set is a group of either RHOSO Compute nodes or RHOSO Network nodes that are grouped by node properties. A node set is comparable to composable roles in previous releases.
Red Hat has tested RHOSO 18 with a deployment of up to 250 bare-metal nodes with various node sets.
For information on RHOSO deployment steps, see Deploying Red Hat OpenStack Services on OpenShift.
1.1. Control plane requirements for RHOSO at scale Copy linkLink copied to clipboard!
Ensure that your Red Hat OpenShift Container Platform (RHOCP) cluster meets the minimum tested requirements for hosting Red Hat OpenStack Services on OpenShift (RHOSO).
| System requirement | Description |
|---|---|
| Node count | 3 node RHOCP cluster at version 4.16 |
| CPUs | 40 total cores, 80 threads |
| Disk | 500GB root disk (1x SSD or 2x hard drives with 7200RPM; RAID 1) |
| 500GB dedicated disk for Swift (1x SSD or 1x NVMe) | |
| Optional: 500GB disk for image caching (1x SSD or 2x hard drives with 7200RPM; RAID 1) | |
| Memory | 384 GB |
| Network | 10 Gbps or higher |
1.2. Data plane requirements for RHOSO at scale Copy linkLink copied to clipboard!
Ensure that your Compute nodes meet the tested requirements before deploying a Red Hat OpenStack Services on an OpenShift data plane.
| Resource | Contraints |
|---|---|
| Compute nodes in node set | Up to 50 nodes |
| Total compute node Count | Up to 250 nodes |
| CPUs | At least 2 sockets each with 12 cores, 24 threads |
| Disk | At least a 500GB root disk (1x SSD or 2x hard drives with 7200RPM; RAID 1) |
| Memory | At least 128GB (64GB per NUMA node); 2GB is reserved for the host out by default. With Distributed Virtual Routing, increase the reserved RAM to 5GB. |
| Network interfaces | 2 x 10 Gbps or faster |
1.3. Red Hat Ceph Storage node system requirements Copy linkLink copied to clipboard!
For information about system requirements for Red Hat Ceph Storage, see the following resources:
- General principles for selecting hardware in the Red Hat Storage 7 Hardware Guide.
- Integrating Red Hat Ceph Storage with RHOSO.
- Pools, placement groups, and CRUSH Configuration reference in the Red Hat Ceph Storage Configuration Guide.
Chapter 2. RHOSO deployment best practices Copy linkLink copied to clipboard!
The deployment of Red Hat OpenStack Services on OpenShift (RHOSO) can be a network intensive activity. Take steps to reduce the chances of network saturation or unnecessary troubleshooting.
2.1. RHOSO deployment prepartation Copy linkLink copied to clipboard!
Prepare to deploy Red Hat OpenStack Services on OpenShift at scale by reviewing the networking requirements for the deployment, and for operation.
- Dedicate separate NICs for RHOCP and RHOSO networks
You must have a minimum of two NICs for each control plane worker node:
- One NIC is dedicated to OpenShift, facilitating communication between OpenShift components within the cluster network.
- The other NIC is designated for OpenStack, enabling connectivity between OpenStack services on worker nodes and the isolated networks in the RHOSO data plane.
- Limit the number of nodes for Bare Metal (ironic) introspection
When you run introspection on many nodes at once, the introspection process can fail. If this occurs, perform introspection on no more than 50 nodes at a time.
Ensure that the provisioning network has enough IPs allocated for the number of nodes that you expect to have in the environment.
- Enable Jumbo Frames for networks with heavy traffic
A standard frame has a maximum MTU of 1500 bytes. Jumbo frames can be as large as 9000 bytes. Jumbo frames can reduce CPU overhead on high throughput network connections because fewer datagrams must be processed per gigabyte of transferred data.
Enable jumble frames only for networks that have a network switch that supports them. Networks that are known to have better performance with jumbo frames include the following:
- Tenant network
- Storage network
- Management network
2.2. RHOSO deployment configurations Copy linkLink copied to clipboard!
Ensure a successful operation of Red Hat OpenStack Services on OpenShift (RHOSO) by understanding the contraints, behaviors, and networking properties of RHOSO.
- Validate your custom resources (CRs) with a small scale deployment
- Deploy a small environment with a control plane hosted on a three-node RHOCP cluster, one data plane node, and three Red Hat Ceph Storage nodes. Use this configuration to ensure your CR configurations are correct.
- Limit the number of data plane nodes that are provisioned at the same time
- You can typically fit 50 servers on an average enterprise-level rack unit. Using this assumption, deploy one rack at a time, using one node set per rack. Red Hat has successfully tested a deployment with 5 node sets totalling 250 nodes. When you limit the number of nodes in a single node set, you minimize the debugging necessary to diagnose potential deployment issues.
- Power off unused Bare Metal Provisioning (ironic) nodes
When Bare Metal automated cleaning is enabled, and a node fails that cleaning, that node is set to maintenance mode. Nodes that are in maintenance mode can remain in a powered on state, and be incorrectly reported by Bare Metal Provisioning (ironic) as powered off. This can cause problems with ongoing deployments.
If you are redeploying after a failed deployment, ensure that you power off all unused nodes that use the power management device of the node.
- Improve instance distribution across Compute
The Compute scheduler updates Compute resources only after instances that have been scheduled are confirmed for the Compute node. To help prevent the uneven distribution of instances on Compute nodes, perform the following actions:
-
Set the value of the
[filter_scheduler]shuffle_best_same_weighed_hostsparameter totrue. -
To ensure that a Compute node is not overloaded with instances, set
max_instances_per_hostto the maximum number of instances that any Compute node can spawn and ensure that theNumInstancesFilterparameter is enabled. When this instance count is reached by a Compute node, then the scheduler will no longer select it for further instance spawn scheduling.
-
Set the value of the
Additionally, set configurations for the Networking service (neutron) to improve performance at scale.
The ovsdb-server sends probes at specific intervals to the following clients:
- neutron
- ovn-controller
- ovn-metadata-agent
If ovsdb-server does not get a reply from one of these clients before a timeout is reached, it disconnects from the client and forces a reconnect. A client can be slow to respond after the initial connection, when it is loading a copy of the database into memory. If the timeout is too low, the ovsdb-server can disconnect the client during this process. When the client reconnects, the process starts over and continuously repeats. If the maximum timeout interval does not work, disable the probe by setting this value to 0.
If the client-side probe intervals are disabled, they use TCP keepalive messages to monitor their connections to the ovsdb-server.
The following settings are tested and validated to improve performance and stability on a large-scale RHOSO environment.
- OVN Southbound server-side inactivity probe
Increase the probe interval to
180000ms in theOpenStackControlPlanCR file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - OVN Northbound server-side inactivity probe
Increase the probe interval to
60000ms:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - OVN controller remote probe interval on Compute nodes
Increase the probe interval to
180000ms using theedpm_ovn_remote_probe_intervalin your node set CR file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Network service client-side probe interval
Increase the probe interval to
180000ms using thecustomServiceConfighook in theneutron/templatesection of yourOpenStackControlPlaneCR file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Networking service api_workers
-
Increase the default number of separate API worker processes to
16or more, based on the load on theneutron-server. - Networking service agent_down_time
Set
agent_down_timeto the maximum permissible number of2147483for very large clusters. Use thecustomServiceConfighook in the neutron/template section of your OpenStackControlPlane CR file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - OVN controller remote probe interval on Compute nodes
Increase the probe interval to
180000ms using theedpm_ovn_remote_probe_intervalin your node set CR file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - OVN metadata client-side probe interval on Compute nodes
Increase the probe interval to
180000ms using theedpm_neutron_metadata_agent_ovn_ovsdb_probe_intervalin your node set CR file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.3. Tuning the control plane for large scale deployments Copy linkLink copied to clipboard!
When you scale your Red Hat OpenStack Services on OpenShift (RHOSO) deployment, consider tuning your custom resources (CR)s to have more resources.
Procedure
Edit the
rabbitmq-cell1section of theOpenStackControlPlanemanifest file and configure resources to the following values:- persistence storage: 20Gi
- replicas: 3
- cpu: 8
memory: 20Gi
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Edit the
galerasection of the OpenStackControlPlane manifest file and configure resources to the following values:- replicas: 3
storageRequest: 20G
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
Edit the
nova,neutron,keystone, andglancesections of the OpenStackControlPlane manifest file and configure those services to have three replicas.