Chapter 2. Planning and preparation for an in-place upgrade
Before you conduct an in-place upgrade of your OpenStack Platform environment, create a plan for the upgrade and accommodate any potential obstacles that might block a successful upgrade.
2.1. Familiarize yourself with Red Hat OpenStack Platform 16.2
Before you perform an upgrade, familiarize yourself with Red Hat OpenStack Platform 16.2 to help you understand the resulting environment and any potential version-to-version changes that might affect your upgrade. To familiarize yourself with Red Hat OpenStack Platform 16.2, follow these suggestions:
Read the release notes for all versions across the upgrade path and identify any potential aspects that require planning:
- Components that contain new features
- Known issues
Open the release notes for each version using these links:
- Red Hat OpenStack Platform 13, which is your current version
- Red Hat OpenStack Platform 14
- Red Hat OpenStack Platform 15
- Red Hat OpenStack Platform 16.0
- Red Hat OpenStack Platform 16.1
- Red Hat OpenStack Platform 16.2, which is your target version
- Read the Director Installation and Usage guide for version 16.2 and familiarize yourself with any new requirements and processes in this guide.
- Install a proof-of-concept Red Hat OpenStack Platform 16.2 undercloud and overcloud. Develop hands-on experience of the target OpenStack Platform version and investigate potential differences between the target version and your current version.
2.2. High level changes in Red Hat OpenStack Platform 16.2
The following high-level changes occur during the upgrade to Red Hat OpenStack Platform 16.2:
-
OpenStack Platform director 16.2 configures the overcloud using an Ansible-driven method called
config-download
. This replaces the standard heat-based configuration method. Director still uses heat to orchestrate provisioning operations. -
The director installation uses the same method as the overcloud deployment. Therefore, the undercloud also uses
openstack-tripleo-heat-templates
as a blueprint for installing and configuring each service. - The undercloud runs OpenStack services in containers.
- The undercloud pulls and stores container images through a new method. Instead of pulling container images before deploying the overcloud, the undercloud pulls all relevant container images during the deployment process.
- The overcloud deployment process includes an Advanced Subscription Management method to register nodes. This method incorporates an Ansible role to register OpenStack Platform nodes. The new method also applies different subscriptions to different node roles if necessary.
- The overcloud now uses Open Virtual Network (OVN) as the default ML2 mechanism driver. It is possible to migrate your Open vSwitch (OVS) service to OVN, which you perform after the completion of a successful upgrade.
- The undercloud and overcloud both run on Red Hat Enterprise Linux 8.
-
openstack-tripleo-heat-templates
includes a unified composable service template collection in thedeployment
directory. This directory now includes templates with merged content from both the containerized service and Puppet-based composable service templates. The OpenStack Data Processing service (sahara) is no longer supported.
ImportantIf you have sahara enabled in your Red Hat OpenStack Platform 13 environment, do not continue with this upgrade and contact Red Hat Global Support Services.
- The OpenStack Telemetry components are deprecated in favor of the Service Telemetry Framework (STF).
Starting with Red Hat Enterprise Linux (RHEL) version 8.3, support for the Intel Transactional Synchronization Extensions (TSX) feature is disabled by default. This causes issues with instance live migration between hosts when migrating from hosts that run Red Hat OpenStack Platform 13 with RHEL version 8.2, to hosts that run Red Hat OpenStack Platform 16.2 with RHEL version 8.4.
Instance live migration fails after you reboot the Compute nodes. To ensure that the upgraded nodes are booted with the TSX feature enabled and that you can successfully live migrate your instances, add
tsx=off
to yourKernelArgs
role parameter for the Compute node and reboot the node.For more information, see the Red Hat Knowledgebase solution Guidance on Intel TSX impact on OpenStack guests (applies for RHEL 8.3 and above).
2.3. Changes in Red Hat Enterprise Linux 8
The undercloud and overcloud both run on Red Hat Enterprise Linux 8. This includes new tools and functions relevant to the undercloud and overcloud:
-
The undercloud and overcloud use the Red Hat Container Toolkit. Instead of
docker
to build and control the container lifecycle, Red Hat Enterprise Linux 8 includesbuildah
to build new container images andpodman
for container management. -
Red Hat Enterprise Linux 8 does not include the
docker-distribution
package. The undercloud now includes a private HTTP registry to provide container images to overcloud nodes. -
The upgrade process from Red Hat Enterprise Linux 7 to 8 uses the
leapp
tool. -
Red Hat Enterprise Linux 8 does not use the
ntp
service. Instead, Red Hat Enterprise Linux 8 useschronyd
. - Red Hat Enterprise Linux 8 includes new versions of high availability tools.
The Red Hat OpenStack Platform 16.2 uses Red Hat Enterprise Linux 8.4 as the base operating system. As a part of the upgrade process, you will upgrade the base operating system of nodes to Red Hat Enterprise Linux 8.4.
For more information about the key differences between Red Hat Enterprise Linux 7 and 8, see Considerations in adopting RHEL 8. For general information about Red Hat Enterprise linux 8, see Product Documentation for Red Hat Enterprise Linux 8.
2.4. Leapp upgrade usage in Red Hat OpenStack Platform
The long-life Red Hat OpenStack Platform upgrade requires a base operating system upgrade from Red Hat Enterprise Linux 7 to Red Hat Enterprise Linux 8. Red Hat Enterprise Linux 7 uses the Leapp utility to perform the upgrade to Red Hat Enterprise Linux 8. To ensure that Leapp and its dependencies are available, verify that the following Red Hat Enterprise Linux 7 repositories are enabled:
Red Hat Enterprise Linux 7 Server RPMs x86_64 7Server or Red Hat Enterprise Linux 7 Server RPMs x86_64 7.9
rhel-7-server-rpms x86_64 7Server or: rhel-7-server-rpms x86_64 7.9
Red Hat Enterprise Linux 7 Server - Extras RPMs x86_64
rhel-7-server-extras-rpms x86_64
For more information, see Preparing a RHEL 7 system for the upgrade.
The undercloud and overcloud use a separate process for performing the operating system upgrade.
Undercloud process
Run the leapp
upgrade manually before you run the openstack undercloud upgrade
command. The undercloud upgrade includes instructions for performing the leapp
upgrade.
Overcloud process
The overcloud upgrade framework automatically runs the leapp
upgrade.
Limitations
For information of potential limitations that might affect your upgrade, see the following sections from the Upgrading from RHEL 7 to RHEL 8 guide:
In particular, you cannot perform a Leapp upgrade on nodes that use encryption of the whole disk or a partition, such as LUKS encryption, or file-system encryption. This limitation affects Ceph OSD nodes that you have configured with the dmcrypt: true
parameter.
If any known limitations affect your environment, seek advice from the Red Hat Technical Support Team.
Troubleshooting
For information about troubleshooting potential Leapp issues, see Troubleshooting in Upgrading from RHEL 7 to RHEL 8.
2.5. Supported upgrade scenarios
Before proceeding with the upgrade, check that your overcloud is supported.
If you are uncertain whether a particular scenario not mentioned in these lists is supported, seek advice from the Red Hat Technical Support Team.
Supported scenarios
The following in-place upgrade scenarios are tested and supported.
- Standard environments with default role types: Controller, Compute, and Ceph Storage OSD
- Split-Controller composable roles
-
Ceph Storage composable roles, including Ceph Storage custom configurations, such as
CephConfigOverrides
andCephAnsibleExtraConfig
- Hyper-Converged Infrastructure: Compute and Ceph Storage OSD services on the same node
- Environments with Network Functions Virtualization (NFV) technologies: Single-root input/output virtualization (SR-IOV) and Data Plane Development Kit (DPDK)
Environments with Instance HA enabled
NoteDuring an upgrade procedure, nova live migrations are supported. However, evacuations initiated by Instance HA are not supported. When you upgrade a Compute node, the node is shut down cleanly and any workload running on the node is not evacuated by Instance HA automatically. Instead, you must perform live migration manually.
Technology preview scenarios
The framework for upgrades is considered a Technology Preview when you use it in conjunction with these features, and therefore is not fully supported by Red Hat. You should only test this scenario in a proof-of-concept environment and not upgrade in a production environment. For more information about Technology Preview features, see Scope of Coverage Details.
- Edge and Distributed Compute Node (DCN) scenarios
2.6. Considerations for upgrading with external Ceph deployments
If you have deployed a Red Hat Ceph Storage system separately and then used director to deploy and configure OpenStack, you can use the Red Hat OpenStack Platform framework for upgrades to perform an in-place upgrade with external Ceph deployments. This scenario is different from upgrading a Ceph cluster that was deployed using director.
The differences that you must take into account when planning and preparing for an in-place upgrade with external Ceph deployments are the following:
- Before you can upgrade your Red Hat OpenStack Platform deployment from version 13 to version 16.2, you must upgrade your Red Hat Ceph Storage cluster from version 3 to version 4. For more information, see Upgrading a Red Hat Ceph Storage cluster in the Red Hat Ceph Storage 4 Installation Guide.
- After you upgrade your Red Hat Ceph Storage cluster from version 3 to version 4, Red Hat OpenStack Platform 13 might still run RHCSv3 client components, however these are compatible against the RHCSv4 cluster.
- You can follow the upgrade path described in the Framework For Upgrades (13 to 16.2) document, and where applicable, you must complete the conditional steps that support this particular scenario. A conditional step starts with the following statement: "If you are upgrading with external Ceph deployments".
-
When you upgrade with external Ceph deployments, you install RHCSv4
ceph-ansible
as part of the overcloud upgrade process. When you upgrade a Ceph cluster that was deployed using director, you install RHCSv4ceph-ansible
after the overcloud upgrade process is complete.
When you upgrade a Red Hat Ceph Storage cluster from a previous supported version to version 4.2z2, the upgrade completes with the storage cluster in a HEALTH_WARN
state with a warning message that states monitors are allowing insecure global_id
reclaim. This is due to the patched CVE (CVE-2021-20288), see Ceph HEALTH_WARN with 'mons are allowing insecure global_id reclaim' after install/upgrade to RHCS 4.2z2 (or newer).
Because the HEALTH_WARN
state is displayed due to the CVE, it is possible to mute health warnings temporarily. However, there is a risk that if you mute warnings you do not have visibility about potential older and unpatched clients connected to your cluster. For more information about muting health warnings, see Upgrading a Red Hat Ceph Storage cluster in the Red Hat Ceph Storage documentation.
2.7. Known issues that might block an upgrade
Review the following known issues that might affect a successful upgrade.
- BZ#2228414 - Missing service_user for nova_compute causes nova hybrid state to fail
A service token is now required for OpenStack Compute (nova) and Openstack Block (cinder) services. During an upgrade from Red Hat OpenStack Platform (RHOSP) 13 to 16.2, if the service token is not configured, live migrations fail with the following error in
nova-compute.log
:"2023-xx-xx xx:xx:xx.xxx 8 ERROR oslo_messaging.rpc.server […] Exception during message handling: cinderclient.exceptions.ClientException: ConflictNovaUsingAttachment: Detach volume from instance XXXXXX using the Compute API (HTTP 409) (Request-ID: req-XXXXXX)"
To avoid this issue, apply the fix from RHBA-2023:5163 - Bug Fix Advisory. You must apply the fix after the undercloud upgrade, but before starting the overcloud adoption.
- BZ#1902849 - osp13-osp16.1 ffu fails on clusters previously upgraded from osp8, osp10
-
Red Hat OpenStack Platform (RHOSP) environments that have been previously upgraded from version RHOSP 10, require the
python-docker
package to avoid BZ#1902849. For more information, see the Red Hat Knowledgebase solution osp13-osp16.1 ffu fails on older environments missing python-docker package. - BZ#1925078 - RHOSP13-16.1 FFU: Overcloud upgrade hangs in controller after failed attempt with reference to wrong ceph image
Systems that use UEFI boot and a UEFI bootloader in OSP13 might run into an UEFI issue that results in:
-
/etc/fstab
not being updated - grub-install is incorrectly used on EFI system
For more information, see the Red Hat Knowledgebase solution FFU 13 to 16.1: Leapp fails to update the kernel on UEFI based systems and /etc/fstab does not contain the EFI partition.
If your systems use UEFI, contact Red Hat Technical Support.
-
- BZ#1895887 - ovs+dpdk fail to attach device OvsDpdkHCI
After upgrading with the Leapp utility, the Compute node with OVS-DPDK workload does not function properly. To resolve this issue, perform one of the following steps:
Remove the
/etc/modules-load.d/vfio-pci.conf
file before you upgrade the Compute node.or
Restart
ovs-vswitchd
service on the Compute node after you upgrade the Compute node.This issue affects RHOSP 16.1.3. For more information, see the Red Hat Knowledgebase solution OVS-DPDK errors after Framework Upgrade from OSP 13 to 16.1 on HCI compute node.
- BZ#1923165 - OSP-16.2 (Upgrades)(TripleO) Add a config to disable Intel "TSX" on RHEL-8.3 kernel
Starting with Red Hat Enterprise Linux (RHEL) version 8.3, support for the Intel Transactional Synchronization Extensions (TSX) feature is disabled by default. This causes issues with instance live migration between hosts in the following migration scenario:
- Migrating from hosts where the TSX kernel argument is enabled to hosts where the TSX kernel argument is disabled.
Live migration can be unsuccessful in Intel hosts that support the TSX feature. For more information about the CPUs that are affected by this issue, see Affected Configurations.
For more information, review the following Red Hat Knowledgebase solution Guidance on Intel TSX impact on OpenStack guests.
- BZ#2016144 - FFU 13-16.1: During Leapp upgrade reboot, openvswitch failed to start with error
Starting ovsdb-server ovsdb-server: /var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
-
Red Hat OpenStack Platform (RHOSP) environments that have been upgraded from previous versions might contain unnecessary files in
/etc/systemd/system/ovs*
. You must remove these files before you begin the overcloud upgrade process from RHOSP 13 to RHOSP 16.2. - BZ#2021525 - openstack overcloud upgrade run times out / HAProxy container fails to start
- An upgrade from Red Hat OpenStack Platform (RHOSP) 13 to RHOSP 16.2 might fail during the deployment step because of invalid SELinux labels. For a resolution and more information, see the Red Hat Knowledgebase solution Pacemaker managed services might not restart during an OSP13 - OSP16.x FFU.
- BZ#2027787 - Undercloud upgrade to 16.2 fails because of missing dependencies of swtpm
-
There is a known issue with the
advanced-virt-for-rhel-8-x86_64-rpms
andadvanced-virt-for-rhel-8-x86_64-eus-rpms
repositories that prevents a successful upgrade. To disable these repositories before upgrading, see the Red Hat Knowledgebase solution advanced-virt-for-rhel-8-x86_64-rpms are no longer required in OSP 16.2. - BZ#2024447 - Identity service (keystone) password for the placement user was overridden by NovaPassword during FFU RHOSP 13 to 16
During an upgrade from Red Hat OpenStack Platform 13 to 16.2, if you define a value for the
NovaPassword
parameter but not thePlacementPassword
parameter, theNovaPassword
parameter overrides the OpenStack Identity service (keystone) password for the placement user. To preserve the Identity service password, do not set theNovaPassword
or thePlacementPassword
in theparameter_defaults
section.If you set both passwords in the
parameter_defaults
section, the Compute nodes might not be able to communicate with the control plane until they are upgraded. For more information about upgrading Compute nodes, see Upgrading Compute nodes.Additionally, if you deployed the overcloud on RHOSP 13 by using the
NovaPassword
,PlacementPassword
, or both, you must remove those passwords from the template and run theopenstack overcloud deploy
command on RHOSP 13 before upgrading to RHOSP 16.2.- BZ#2141186 - Live migration fails due to qemu error during in-place upgrade
During or after an in-place upgrade from Red Hat OpenStack Platform (RHOSP) 13 to RHOSP 16.2, live migration between 16.2 Compute nodes fails on instances with the following configuration:
- Multi-queue is enabled.
- The number of allocated vcps is 9 or more.
- The instance is running on RHOSP 13.
To successfully migrate your Compute nodes during an upgrade, add the following parameter to your custom environment file:
parameter_defaults: ComputeExtraConfig: nova::compute::libvirt::max_queues: 8
Include your updated custom environment file when you run the following commands during the upgrade:
-
openstack overcloud upgrade prepare
-
openstack overcloud upgrade converge
Optionally, after you complete the upgrade, include the custom environment file with the parameter when you run the
openstack overcloud deploy
command.For more information, see the Red Hat Knowledgebase solution Live migration fails due to qemu error in in-place upgrades environment.
- BZ#2141393 - cephvolumescan actor fails
If your environment includes both Ceph and non-Ceph containers, the Leapp upgrade fails because the
cephvolumescan
actor cannot retrieve the ceph volumes list.To disable the
cephvolumescan
actor and complete the Leapp upgrade, add the following parameter to your template:parameter_defaults: LeappActorsToRemove: ['cephvolumescan']
- BZ#2164396 - FFU: Redhat satellite tools repository to be enabled for FFU (13 to 16.2)
- If you are using Satellite version 6.7, the upgrade fails when you enable the Red Hat Satellite Tools for RHEL 8 Server RPMs x86_64 repository. The failure occurs because the appropriate packages cannot be installed. The Red Hat engineering team is investigating a solution to this issue.
- BZ#2245602 - Upgrade (OSP16.2 →OSP17.1) controller-0 does not perform leapp upgrade due to packages missing ovn2.15 openvswitch2.15
If you upgrade from Red Hat OpenStack Platform (RHOSP) 13 to 16.1 or 16.2, or from RHOSP 16.2 to 17.1, do not include the
system_upgrade.yaml
file in the--answers-file answer-upgrade.yaml
file. If thesystem_upgrade.yaml
file is included in that file, theenvironments/lifecycle/upgrade-prepare.yaml
file overwrites the parameters in thesystem_upgrade.yaml
file. To avoid this issue, append thesystem_upgrade.yaml
file to theopenstack overcloud upgrade prepare
command. For example:$ openstack overcloud upgrade prepare --answers-file answer-upgrade.yaml / -r roles-data.yaml / -n networking-data.yaml / -e system_upgrade.yaml / -e upgrade_environment.yaml /
With this workaround, the parameters that are configured in the
system_upgrade.yaml
file overwrite the default parameters in theenvironments/lifecycle/upgrade-prepare.yaml
file.
Red Hat Ceph Storage Issues
- BZ#1855813 - Ceph tools repository should be switched from RHCS3 to RHCS4 only after converge, before running external-upgrade
-
The
ceph-ansible
playbook collection on the undercloud deploys Red Hat Ceph Storage containers on the overcloud. To upgrade your environment, you must have Red Hat Ceph Storage 3 version ofceph-ansible
to maintain Ceph Storage 3 containers through the upgrade. This guide includes instructions on how to retainceph-ansible
version 3 over the course of the upgrade until you are ready to upgrade to Ceph Storage 4. Before performing the 13 to 16.2 upgrade, you must perform a minor version update of your Red Hat OpenStack Platform 13 environment and ensure you haveceph-ansible
version 3.2.46 or later.
2.8. Backup and restore
Before you upgrade your Red Hat OpenStack Platform 13 environment, back up the undercloud and overcloud control plane. For more information about backing up nodes with the Relax-and-recover (ReaR) utility, see the Undercloud and Control Plane Back Up and Restore guide.
- Back up your nodes before you perform an upgrade. For more information about backing up nodes before you upgrade, see Red Hat OpenStack Platform 13 Undercloud and Control Plane Back Up and Restore.
- You can back up each node after it has been upgraded. For more information about backing up upgraded nodes, see Red Hat OpenStack Platform 16.2 Backing up and restoring the undercloud and control plane nodes.
- You can back up the database that runs on the undercloud node after you perform the undercloud upgrade and before you perform the overcloud upgrade. For more information about backing up the undercloud database, see Creating a database backup of the undercloud node in the Red Hat OpenStack Platform 16.2 Backing up and restoring the undercloud and control plane nodes guide.
2.9. Minor version update
Before you upgrade your Red Hat OpenStack Platform environment, update the environment to the latest minor version of your current release. For example, perform an update of your Red Hat OpenStack Platform 13 environment to the latest 13 before running the upgrade to Red Hat OpenStack Platform 16.2.
For instructions on performing a minor version update for Red Hat OpenStack Platform 13, see Keeping Red Hat OpenStack Platform Updated.
2.10. Proxy configuration
If you use a proxy with your Red Hat OpenStack Platform 13 environment, the proxy configuration in the /etc/environment
file will persist past the operating system upgrade and the Red Hat OpenStack Platform 16.2 upgrade.
- For more information about proxy configuration for Red Hat OpenStack Platform 13, see Considerations when running the undercloud with a proxy.
- For more information about proxy configuration for Red Hat OpenStack Platform 16.2, see Considerations when running the undercloud with a proxy.
2.11. Deleting RHEL registration resources
If the DeleteOnRHELUnregistration
parameter is set to true
in an existing environment file or the rhel-registration.yaml
template, the overcloud upgrade cannot proceed. In this case, when you perform a minor update to the latest Red Hat OpenStack Platform 13z version, set the DeleteOnRHELUnregistration
parameter to false
.
Procedure
-
In the
parameter_defaults
section of your environment file, if theDeleteOnRHELUnregistration
parameter is set totrue
, set the parameter tofalse
. -
Run the
openstack overcloud update prepare
command. -
Run the
openstack undercloud upgrade
command.
2.12. Validating Red Hat OpenStack Platform 13 before the upgrade
Before you upgrade to Red Hat OpenStack Platform 16.2, validate your undercloud and overcloud with the tripleo-validations
playbooks. In Red Hat OpenStack Platform 13, you run these playbooks through the OpenStack Workflow Service (mistral).
If you use CDN or Satellite as repository sources, the validation fails. To resolve this issue, see the Red Hat Knowledgebase solution, repos validation fails because of SSL certificate error.
Prerequisites
Confirm that you can ping the overcloud nodes:
$ source ~/stackrc $ tripleo-ansible-inventory --static-yaml-inventory ~/inventory.yaml --stack <stack> --ansible_ssh_user heat-admin $ ansible -i ~/inventory.yaml all -m ping
-
Replace
<stack>
with the name of the stack.
-
Replace
Procedure
-
Log in to the undercloud as the
stack
user. Source the
stackrc
file:$ source ~/stackrc
Create a bash script called
pre-upgrade-validations.sh
and include the following content in the script:#!/bin/bash for VALIDATION in $(openstack action execution run tripleo.validations.list_validations '{"groups": ["pre-upgrade"]}' | jq ".result[] | .id") do echo "=== Running validation: $VALIDATION ===" STACK_NAME=$(openstack stack list -f value -c 'Stack Name') ID=$(openstack workflow execution create -f value -c ID tripleo.validations.v1.run_validation "{\"validation_name\": $VALIDATION, \"plan\": \"$STACK_NAME\"}") while [ $(openstack workflow execution show $ID -f value -c State) == "RUNNING" ] do sleep 1 done echo "" openstack workflow execution output show $ID | jq -r ".stdout" echo "" done
Add permission to run the script:
$ chmod +x pre-upgrade-validations.sh
Run the script:
$ ./pre-upgrade-validations.sh
Review the script output to determine which validations succeed and fail:
=== Running validation: "check-ftype" === Success! The validation passed for all hosts: * undercloud