Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 3. Red Hat OpenStack deployment best practices
Review the following best practices when you plan and prepare to deploy OpenStack. You can apply one or more of these practices in your environment.
3.1. Red Hat OpenStack deployment preparation Copier lienLien copié sur presse-papiers!
Before you deploy Red Hat OpenStack Platform (RHOSP), review the following list of deployment preparation tasks. You can apply one or more of the deployment preparation tasks in your environment:
- Set a subnet range for introspection to accommodate the maximum overcloud nodes for which you want to perform introspection at a time
- When you use director to deploy and configure RHOSP, use CIDR notations for the control plane network to accommodate all overcloud nodes that you add now or in the future.
- Set the root password on your overcloud image to allow console access to the overcloud image
- Use the console to troubleshoot failed deployments when networking is set incorrectly. Adhere to the information security policies of your organization for password management when you implement this recommendation.
- Use scheduler hints to assign hardware to a role
-
Use scheduler hints to assign hardware to a role, such as
Controller
,Compute
,CephStorage
, and others. Scheduler hints provide easier identification of deployment issues that affect only a specific piece of hardware. -
The
nova-scheduler
, which is a single process, can overexert when it schedules a large number of nodes. Scheduler hints reduce the load onnova-scheduler
when scheduler hints implement tag matching. As a result,nova-scheduler
encounters fewer scheduling errors during the deployment and the deployment takes less time when you use scheduler hints. - Do not use profile tagging when you use scheduler hints.
- In performance testing, use identical hardware for specific roles to reduce variability in testing and performance results.
-
Use scheduler hints to assign hardware to a role, such as
- Set the World Wide Name (WWN) as the root disk hint for each node to prevent nodes from using the wrong disk during deployment and booting
- When nodes contain multiple disks, use the introspection data to set the WWN as the root disk hint for each node. This prevents the node from using the wrong disk during deployment and booting. For more information, see Defining the Root Disk for Multi-Disk Clusters in the Director Installation and Usage guide.
- Enable the Bare Metal service (ironic) automated cleaning on nodes that have more than one disk
Use the Bare Metal service automated cleaning to erase metadata on nodes that have more than one disk and are likely to have multiple boot loaders. Nodes might become inconsistent with the boot disk due to the presence of multiple bootloaders on disks, which leads to node deployment failure when you attempt to pull the metadata that uses the wrong URL.
To enable the Bare Metal service automated cleaning, on the undercloud node, edit the
undercloud.conf
file and add the following line:clean_nodes = true
clean_nodes = true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Limit the number of nodes for Bare Metal (ironic) introspection
If you perform introspection on all nodes at the same time, failures might occur due to network constraints. Perform introspection on up to 50 nodes at a time.
Ensure that the
dhcp_start
anddhcp_end
range in theundercloud.conf
file is large enough for the number of nodes that you expect to have in the environment.If there are insufficient available IPs, do not issue more than the size of the range. This limits the number of simultaneous introspection operations. To allow the introspection DHCP leases to expire, do not issue more IP addresses for a few minutes after the introspection completes.
- Prepare Ceph for different types of configurations
The following list is a set of recommendations for different types of configurations:
All-flash OSD configuration
Each OSD requires additional CPUs according to the IOPS capacity of the device type, so Ceph IOPS are CPU-limited at a lower number of OSDs. This is true for NVM SSDs, which can have two orders of magnitude higher IOPS capacity than traditional HDDs. For SATA/SAS SSDs, expect one order of magnitude greater random IOPS/OSD than HDDs, but only about two to four times the sequential IOPS increase. You can supply less CPU resources to Ceph than Ceph needs for OSD devices.
Hyper Converged Infrastructure (HCI)
It is recommended to reserve at least half of your CPU capacity, memory, and network for the OpenStack Compute (nova) guests. Ensure that you have enough CPU capacity and memory to support both OpenStack Compute (nova) guests and Ceph Storage. Observe memory consumption because Ceph Storage memory consumption is not elastic. On a multi-CPUs socket system, limit Ceph CPU consumption with NUMA-pinning Ceph to a single socket. For example, use the
numactl -N 0 -p 0
command. Do not hard-pin Ceph memory consumption to 1 socket.Latency-sensitive applications such as NFV
Place Ceph on the same CPU socket as the network card that Ceph uses and limit the network card interruptions to that CPU socket if possible, with a network application that runs on a different NUMA socket and network card.
If you use dual bootloaders, use disk-by-path for the OSD map. This gives the user consistent deployments, unlike using the device name. The following snippet is an example of the
CephAnsibleDisksConfig
for a disk-by-path mapping.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
3.2. Red Hat OpenStack deployment configuration Copier lienLien copié sur presse-papiers!
Review the following list of recommendations for your Red Hat OpenStack Platform(RHOSP) deployment configuration:
- Validate the heat templates with a small scale deployment
- Deploy a small environment that consists of at least three Controllers, one Compute note, and three Ceph Storage nodes. You can use this configuration to ensure that all of your heat templates are correct.
- Disable telemetry notifications on the undercloud
You can disable telemetry notifications on the undercloud for the following OpenStack services to decrease the RabbitMQ queue:
- Compute (nova)
- Networking (neutron)
- Orchestration (heat)
- Identity (keystone)
To disable the notifications, in the
/usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml
template, set the notification driver setting tonoop
.- Limit the number of nodes that are provisioned at the same time
Fifty is the typical amount of servers that can fit within a average enterprise-level rack unit, therefore, you can deploy an average of one rack of nodes at one time.
To minimize the debugging necessary to diagnose issues with the deployment, deploy no more than 50 nodes at one time. However, if you want to deploy a higher number of nodes, Red Hat has successfully tested up to 100 nodes simultaneously.
To scale Compute nodes in batches, use the
openstack overcloud deploy
command with the--limit
option. This can result in saved time and lower resource consumption on the undercloud.
The --limit
option is in Technology Preview.
Use this option with a comma-separated list of tags from the config-download playbook to run the deployment with a specific set of config-download tasks.
- Disable unused NICs
If the overcloud has any unused NICs during the deployment, you must define the unused interfaces in the NIC configuration templates and set the interfaces to
use_dhcp: false
anddefroute: false
.If you do not define unused interfaces, there might be routing issues and IP allocation problems during introspection and scaling operations. By default, the NICs set
BOOTPROTO=dhcp
, which means the unused overcloud NICs consume IP addresses that are needed for the PXE provisioning. This can reduce the pool of available IP addresses for your nodes.- Power off unused Bare Metal Provisioning (ironic) nodes
- Ensure that you power off any unused Bare Metal Provisioning (ironic) nodes in maintenance mode. Red Hat has identified cases where nodes from previous deployments are left in maintenance mode in a powered on state. This can occur with Bare Metal automated cleaning, where a node that fails cleaning is set to maintenance mode. Bare Metal Provisioning does not track the power state of nodes in maintenance mode and incorrectly reports the power state as off. This can cause problems with ongoing deployments. When you redeploy after a failed deployment, ensure that you power off all unused nodes that use the power management device of the node.
3.3. Tuning the undercloud Copier lienLien copié sur presse-papiers!
Review this section when you plan to scale your Red Hat OpenStack Platform (RHOSP) deployment and tune to your default undercloud settings.
- If you use the Telemetry service (ceilometer), improve the performance of the service
- Because the Telemetry service is CPU-intensive, telemetry is not enabled by default in RHOSP 16.1. If you use want to use Telemetry, you can improve the performance of the service.
- Separate the provisioning and configuration processes
-
To create only the stack and associated RHOSP resources, you can run the deployment command with the
--stack-only
option. -
Red Hat recommends separating the stack and
config-download
steps when deploying more than 100 nodes.
-
To create only the stack and associated RHOSP resources, you can run the deployment command with the
Include any environment files that are required for your overcloud:
After you have provisioned the stack, you can enable SSH access for the
tripleo-admin
user from the undercloud to the overcloud. Theconfig-download
process uses thetripleo-admin
user to perform the Ansible based configuration:openstack overcloud admin authorize
$ openstack overcloud admin authorize
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To disable the overcloud stack creation and run only the
config-download
workflow to apply the software configuration, you can run the deployment command with the--config-download-only
option. Include any environment files that are required for your overcloud:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
To limit the
config-download
playbook execution to a specific node or set of nodes, you can use the--limit
option. The
--limit
option can be used to separate nodes into different roles, to limit the number of nodes to deploy, or to separate nodes with a specific hardware type. For scale-up operations, when you want to apply software configuration on the new nodes only, use the--limit
option with the--config-download-only
option.Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the
--limit
option always include<Controller>
and<Undercloud>
in the list. Tasks that use theexternal_deploy_steps
interface, for example all Ceph configurations, are only executed if the<Undercloud>
is included in the options list. Allexternal_deploy_steps
tasks run on the undercloud.For example, if you run a scale-up task to add a compute node that requires a connection to Ceph and you do not include
<Undercloud>
in the list, the Ceph configuration andcephx
key files are missing, and the task fails. Do not use the--skip-tags external_deploy_steps
option or the task fails.