Chapter 7. Initial steps for overcloud preparation
You must complete some initial steps to prepare for the overcloud upgrade.
7.1. Preparing for overcloud service downtime Copy linkLink copied to clipboard!
The overcloud upgrade process disables the main control plane services at key points. You cannot use any overcloud services to create new resources when these key points are reached. Workloads that are running in the overcloud remain active during the upgrade process, which means instances continue to run during the upgrade of the control plane. During an upgrade of Compute nodes, these workloads can be live migrated to Compute nodes that are already upgraded.
It is important to plan a maintenance window to ensure that no users can access the overcloud services during the upgrade.
Affected by overcloud upgrade
- OpenStack Platform services
Unaffected by overcloud upgrade
- Instances running during the upgrade
- Ceph Storage OSDs (backend storage for instances)
- Linux networking
- Open vSwitch networking
- Undercloud
7.2. Selecting Compute nodes for upgrade testing Copy linkLink copied to clipboard!
The overcloud upgrade process allows you to either:
- Upgrade all nodes in a role
- Individual nodes separately
To ensure a smooth overcloud upgrade process, it is useful to test the upgrade on a few individual Compute nodes in your environment before upgrading all Compute nodes. This ensures no major issues occur during the upgrade while maintaining minimal downtime to your workloads.
Use the following recommendations to help choose test nodes for the upgrade:
- Select two or three Compute nodes for upgrade testing
- Select nodes without any critical instances running
- If necessary, migrate critical instances from the selected test Compute nodes to other Compute nodes
7.3. Creating an overcloud inventory file Copy linkLink copied to clipboard!
Generate an Ansible inventory file of all nodes in your environment with the tripleo-ansible-inventory command.
Procedure
-
Log in to the undercloud as the
stackuser. Source the
stackrcfile.$ source ~/stackrcCreate a static inventory file of all nodes:
$ tripleo-ansible-inventory --static-yaml-inventory ~/inventory.yaml --stack <STACK_NAME> --ansible_ssh_user heat-adminIf you are not using the default
overcloudstack name, replace<STACK NAME>with the name of your stack.To execute Ansible playbooks on your environment, run the
ansible-playbookcommand and include the full path of the dynamic inventory tool using the-ioption. For example:(undercloud) $ ansible-playbook -i ~/inventory.yaml <PLAYBOOK>
7.4. Validating the pre-upgrade requirements Copy linkLink copied to clipboard!
Run the pre-upgrade validation group to check the pre-upgrade requirements.
For more information about the Red Hat OpenStack Platform (RHOSP) validation framework, see Using the validation framework in the Director Installation and Usage guide.
Procedure
Source the
stackrcfile.$ source ~/stackrcRun the
openstack tripleo validator runcommand with the--group pre-upgradeoption and include the/usr/libexec/platform-pythonpython runtime environment:$ openstack tripleo validator run --group pre-upgrade --python-interpreter /usr/libexec/platform-python -i inventory.yamlNoteEnsure that you include the list of packages that you want to check. You can
use --extra-varsor--extra-vars-filein the command to supply the list through the CLI. For more information, see Command arguments for tripleo validator run.Review the results of the validation report. To view detailed output from a specific validation, run the
openstack tripleo validator show run --fullcommand against the UUID of the specific validation from the report:$ openstack tripleo validator show run --full <UUID>
A FAILED validation does not prevent you from deploying or running RHOSP. However, a FAILED validation can indicate a potential issue with a production environment.
7.5. Disabling fencing in the overcloud Copy linkLink copied to clipboard!
Before you upgrade the overcloud, ensure that fencing is disabled.
When you upgrade the overcloud, you upgrade each Controller node individually to retain high availability functionality. If fencing is deployed in your environment, the overcloud might detect certain nodes as disabled and attempt fencing operations, which can cause unintended results.
If you have enabled fencing in the overcloud, you must temporarily disable fencing for the duration of the upgrade to avoid any unintended results.
Procedure
-
Log in to the undercloud as the
stackuser. Source the
stackrcfile.$ source ~/stackrcLog in to a Controller node and run the Pacemaker command to disable fencing:
$ ssh heat-admin@<controller_ip> "sudo pcs property set stonith-enabled=false"Replace
<controller_ip>with the IP address of a Controller node. You can find the IP addresses of your Controller nodes with theopenstack server listcommand.-
In the
fencing.yamlenvironment file, set theEnableFencingparameter tofalseto ensure that fencing stays disabled during the upgrade process.
Additional Resources
7.6. Checking custom Puppet parameters Copy linkLink copied to clipboard!
If you use the ExtraConfig interfaces for customizations of Puppet parameters, Puppet might report duplicate declaration errors during the upgrade. This is due to changes in the interfaces provided by the puppet modules themselves.
This procedure shows how to check for any custom ExtraConfig hieradata parameters in your environment files.
Procedure
Select an environment file and the check if it has an
ExtraConfigparameter:$ grep ExtraConfig ~/templates/custom-config.yaml-
If the results show an
ExtraConfigparameter for any role (e.g.ControllerExtraConfig) in the chosen file, check the full parameter structure in that file. If the parameter contains any puppet Hierdata with a
SECTION/parametersyntax followed by avalue, it might have been been replaced with a parameter with an actual Puppet class. For example:parameter_defaults: ExtraConfig: neutron::config::dhcp_agent_config: 'DEFAULT/dnsmasq_local_resolv': value: 'true'Check the director’s Puppet modules to see if the parameter now exists within a Puppet class. For example:
$ grep dnsmasq_local_resolvIf so, change to the new interface.
The following are examples to demonstrate the change in syntax:
Example 1:
parameter_defaults: ExtraConfig: neutron::config::dhcp_agent_config: 'DEFAULT/dnsmasq_local_resolv': value: 'true'Changes to:
parameter_defaults: ExtraConfig: neutron::agents::dhcp::dnsmasq_local_resolv: trueExample 2:
parameter_defaults: ExtraConfig: ceilometer::config::ceilometer_config: 'oslo_messaging_rabbit/rabbit_qos_prefetch_count': value: '32'Changes to:
parameter_defaults: ExtraConfig: oslo::messaging::rabbit::rabbit_qos_prefetch_count: '32'