Chapter 3. Recommended deployment practices

3.1. Deployment preparation considerations
Copy link

Set root password for overcloud image

Set the root password on your overcloud image to allow console access to the overcloud image. Use the console to troubleshoot failed deployments when networking is set incorrectly. See Installing virt-customize to the director and Setting the Root Password in the Partner Integration Guide.

Assign specific node IDs

Use scheduler hints to assign hardware to a role, such as Controller, Compute, CephStorage, and others. Scheduler hints allow for easier identification of deployment issues that affect only a specific piece of hardware.
The nova-scheduler, which is a single process, can overexert when scheduling a large number of nodes. Scheduler hints reduce the load on nova-scheduler when implementing tag matching. As a result, nova-scheduler encounters fewer scheduling errors during the deployment. The deployment in general takes less time with scheduler hints.
Do not use profile tagging when using scheduler hints.
In performance testing, use identical hardware for specific roles in order to reduce variability in testing and performance results.
See Assigning Specific Node IDs in the Advanced Overcloud Customization Guide.

Set root disk hints

When nodes contain multiple disks, use the introspection data to set the WWN as the root disk hint for each node. This prevents the node from using the wrong disk during deployment and booting. See Defining the Root Disk in the Director Installation and Usage Guide.

Use OpenStack Bare Metal service (ironic) cleaning

It is highly recommended to use ironic automated cleaning to erase metadata on nodes that have more than one disk and are likely to have multiple boot loaders. There are some cases where nodes are inconsistent with the boot disk due to the presence of multiple bootloaders on disks, which leads to nodes failing to deploy when attempting to pull the metadata using the wrong URL.

Limit the number of nodes for ironic introspection

Introspecting all nodes at once result in failure. The recommendation is 20 nodes at a time for introspection. Make sure that the dhcp_start and dhcp_end range in the undercloud.conf file is large enough for the number of nodes you expect to have in the environment. If not enough IPs are available, issue no more than the size of the range to limit the number of simultaneous introspection operations. Do not issue more IP addresses for a few minutes after the introspection completes to allow introspection DHCP leases to expire.

Ceph preparation

The following list is a set of recommendations for different types of configurations:
All-flash OSD configuration
Each OSD requires additional CPU according to the IOPS capacity of the device type, so Ceph IOPS are CPU-limited at a lower number of OSDs. This is true for NVM SSDs, which can have two orders of magnitude higher IOPS capacity than traditional HDDs. For SATA/SAS SSDs, expect one order of magnitude greater random IOPS/OSD than HDDs, but only about two to four times the sequential IOPS increase. You can supply less CPU resources to Ceph than Ceph needs for OSD devices, but all-flash configurations are expensive.
Hyper Converged Infrastructure (HCI)
It is recommended to reserve at least half of your CPU, memory, and network for the OpenStack Compute (nova) guests. Plan on having enough CPU and memory to support both OpenStack Compute (nova) guests and Ceph Storage. Observe memory consumption because Ceph Storage memory consumption is not elastic. On a multi-CPU socket system, limit Ceph CPU consumption with NUMA-pinning Ceph to a single socket. For example use the numactl -N 0 -p 0 command. Do not hard-pin Ceph memory consumption to 1 socket.
Latency-sensitive applications such as NFV
Place Ceph on the same CPU socket as the network card Ceph uses and limit the network card interruptions to that CPU socket if possible, with a network application running on a different NUMA socket and network card.

If using dual bootloaders, it is recommended to use disk-by-path for the OSD map. This gives the user consistent deployments, unlike using the device name. The following snippet is an example of the CephAnsibleDisksConfig for a disk-by-path mapping.

CephAnsibleDisksConfig:
  osd_scenario: non-collocated
  devices:
    - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:0:0
    - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1:0
  dedicated_devices:
    - /dev/nvme0n1
    - /dev/nvme0n1
  journal_size: 512

3.2. Deployment considerations
Copy link

Validate the deployment command with small scale

Deploy a small environment that consists of at least 3 Controllers, 1 Compute, and 3 Ceph Storage nodes. Use this configuration to ensure that all of your Heat templates are correct. Adding more nodes increases the amount of time to deploy, so running a small deployment with this recommended node layout and any other node types you might have confirms if an issue exists in your Heat templates.

Limit the number of nodes provisioned at the same time

Red Hat recommends deploying 32 nodes at the same time. 32 is the typical amount of servers that can fit within a average enterprise-level rack unit, which allows you to deploy an average of one rack of nodes simultaneously. Deploy no more than 32 nodes at a time to minimize the debugging necessary to diagnose issues with the deployment. If you feel comfortable deploying a higher number of nodes, Red Hat has tested up to 100 nodes simultaneously with high success.

Disable unused NICs

If the overcloud has any unused NICs during the deployment, you must define the unused interfaces in the NIC configuration templates and set the interfaces to use_dhcp: false and defroute: false. Failing to do so causes routing issues and IP allocation problems during introspection and scaling operations. By default, the NICs set BOOTPROTO=dhcp, which means the unused overcloud NICs consume IP addresses meant for the PXE provisioning. This can reduce the pool of available IP addresses for your nodes.

Power off unused ironic nodes

Ensure that you power off any unused ironic nodes in maintenance mode. Red Hat has identified cases where nodes from previous deployments are left in maintenance mode in a powered on state. This can occur with OpenStack Bare Metal (ironic) automated cleaning where a node that fails cleaning is put into maintenance mode. Since ironic does not track the power state of nodes in maintenance mode, ironic incorrectly reports the power state as off. This can cause problems with ongoing deployments. When redeploying after a failed deployment, ensure that you power off any unused nodes using the node’s power management device.

3.3. Undercloud tuning considerations
Copy link

Increase Keystone Worker count

Red Hat recommends that you have more than 8 keystone admin processes and 4 keystone main processes on your undercloud. The configuration files are /etc/httpd/conf.d/10-keystone_wsgi_admin.conf and /etc/httpd/conf.d/10-keystone_wsgi_main.conf.
To make a persistent change across upgrades or when you rerun openstack undercloud install, inject a custom hieradata file by setting hieradata_override in the undercloud.conf file. Add the following lines to the custom hieradata file:
```
keystone::wsgi::apache::custom_wsgi_process_options_admin: { processes : "8" }
keystone::wsgi::apache::custom_wsgi_process_options_main: { processes : "4" }
```

Increase the response timeout for Heat API calls

The default rpc_response_timeout is set to 600 seconds in /etc/heat/heat.conf. In cases with severe resource contention, increase the timeout. If you see the deployment exiting with messaging timeouts, that is an indicator to increase this setting. This should not be a common issue.
To make a persistent change across upgrades or when you rerun openstack undercloud install, add the following line to the custom hieradata file and specify a suitable timeout time:
```
heat::rpc_response_timeout: 600
```

Increase Keystone token timeout time

If you increase the overcloud deploy timeout time to more than 14,400 seconds, you must update the keystone token expiration timeout in keystone.conf to the equivalent value in seconds. The default Keystone token timeout time is 14400 seconds.
To make a persistent change across upgrades or when you rerun openstack undercloud install, add the following line to the custom hieradata file and specify a suitable timeout time:
```
* keystone::token_expiration: 14400
```

If Telemetry is not used, disable it

If you do not require metric data, which is used for billing purposes, disable Telemetry. To disable Telemetry on the undercloud, edit the undercloud.conf file, change the enable_telemetry value to false, and rerun the openstack undercloud install command.
To disable Telemetry during openstack overcloud deploy, see Telemetry in the Deployment Recommendations for Specific Red Hat OpenStack Platform Services Guide for more information.

Chapter 3. Recommended deployment practices

3.1. Deployment preparation considerations
Copy link

3.2. Deployment considerations
Copy link

3.3. Undercloud tuning considerations
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 3. Recommended deployment practices

3.1. Deployment preparation considerationsCopy linkLink copied to clipboard!

3.2. Deployment considerationsCopy linkLink copied to clipboard!

3.3. Undercloud tuning considerationsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Deployment preparation considerations
Copy link

3.2. Deployment considerations
Copy link

3.3. Undercloud tuning considerations
Copy link