Chapter 2. Preparing for an OpenStack Platform upgrade


This process prepares your OpenStack Platform environment. This involves the following steps:

  • Backing up both the undercloud and overcloud.
  • Updating the undercloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
  • Rebooting the undercloud in case a newer kernel or newer system packages are installed.
  • Updating the overcloud to the latest minor version of OpenStack Platform 10, including the latest Open vSwitch.
  • Rebooting the overcloud nodes in case a newer kernel or newer system packages are installed.
  • Performing validation checks on both the undercloud and overcloud.

These procedures ensure your OpenStack Platform environment is in the best possible state before proceeding with the upgrade.

2.1. Creating a baremetal Undercloud backup

A full undercloud backup includes the following databases and files:

  • All MariaDB databases on the undercloud node
  • MariaDB configuration file on the undercloud (so that you can accurately restore databases)
  • The configuration data: /etc
  • Log data: /var/log
  • Image data: /var/lib/glance
  • Certificate generation data if using SSL: /var/lib/certmonger
  • Any container image data: /var/lib/docker and /var/lib/registry
  • All swift data: /srv/node
  • All data in the stack user home directory: /home/stack
Note

Confirm that you have sufficient disk space available on the undercloud before performing the backup process. Expect the archive file to be at least 3.5 GB, if not larger.

Procedure

  1. Log into the undercloud as the root user.
  2. Back up the database:

    Copy to Clipboard Toggle word wrap
    [root@director ~]# mysqldump --opt --all-databases > /root/undercloud-all-databases.sql
  3. Create a backup directory and change the user ownership of the directory to the stack user:

    Copy to Clipboard Toggle word wrap
    [root@director ~]# mkdir /backup
    [root@director ~]# chown stack: /backup

    You will use this directory to store the archive containing the undercloud database and file system.

  4. Change to the backup directory

    Copy to Clipboard Toggle word wrap
    [root@director ~]# cd /backup
  5. Archive the database backup and the configuration files:

    Copy to Clipboard Toggle word wrap
    [root@director ~]# tar --xattrs --xattrs-include='*.*' --ignore-failed-read -cf \
        undercloud-backup-$(date +%F).tar \
        /root/undercloud-all-databases.sql \
        /etc \
        /var/log \
        /var/lib/glance \
        /var/lib/certmonger \
        /var/lib/docker \
        /var/lib/registry \
        /srv/node \
        /root \
        /home/stack
    • The --ignore-failed-read option skips any directory that does not apply to your undercloud.
    • The --xattrs and --xattrs-include='.' options include extended attributes, which are required to store metadata for Object Storage (swift) and SELinux.

    This creates a file named undercloud-backup-<date>.tar.gz, where <date> is the system date. Copy this tar file to a secure location.

Related Information

2.2. Backing up the overcloud control plane services

The following procedure creates a backup of the overcloud databases and configuration. A backup of the overcloud database and services ensures you have a snapshot of a working environment. Having this snapshot helps in case you need to restore the overcloud to its original state in case of an operational failure.

Important

This procedure only includes crucial control plane services. It does not include backups of Compute node workloads, data on Ceph Storage nodes, nor any additional services.

Procedure

  1. Perform the database backup:

    1. Log into a Controller node. You can access the overcloud from the undercloud:

      Copy to Clipboard Toggle word wrap
      $ ssh heat-admin@192.0.2.100
    2. Change to the root user:

      Copy to Clipboard Toggle word wrap
      $ sudo -i
    3. Create a temporary directory to store the backups:

      Copy to Clipboard Toggle word wrap
      # mkdir -p /var/tmp/mysql_backup/
    4. Obtain the database password and store it in the MYSQLDBPASS environment variable. The password is stored in the mysql::server::root_password variable within the /etc/puppet/hieradata/service_configs.json file. Use the following command to store the password:

      Copy to Clipboard Toggle word wrap
      # MYSQLDBPASS=$(sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password)
    5. Backup the database:

      Copy to Clipboard Toggle word wrap
      # mysql -uroot -p$MYSQLDBPASS -s -N -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" | xargs mysqldump -uroot -p$MYSQLDBPASS --single-transaction --databases > /var/tmp/mysql_backup/openstack_databases-$(date +%F)-$(date +%T).sql

      This dumps a database backup called /var/tmp/mysql_backup/openstack_databases-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

    6. Backup all the users and permissions information:

      Copy to Clipboard Toggle word wrap
      # mysql -uroot -p$MYSQLDBPASS -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -uroot -p$MYSQLDBPASS -s -N -e | sed 's/$/;/' > /var/tmp/mysql_backup/openstack_databases_grants-$(date +%F)-$(date +%T).sql

      This dumps a database backup called /var/tmp/mysql_backup/openstack_databases_grants-<date>.sql where <date> is the system date and time. Copy this database dump to a secure location.

  2. Backup the Pacemaker configuration:

    1. Log into a Controller node.
    2. Run the following command to create an archive of the current Pacemaker configuration:

      Copy to Clipboard Toggle word wrap
      # sudo pcs config backup pacemaker_controller_backup
    3. Copy the resulting archive (pacemaker_controller_backup.tar.bz2) to a secure location.
  3. Backup the OpenStack Telemetry database:

    1. Connect to any controller and get the IP of the MongoDB primary instance:

      Copy to Clipboard Toggle word wrap
      # MONGOIP=$(sudo hiera -c /etc/puppet/hiera.yaml mongodb::server::bind_ip)
    2. Create the backup:

      Copy to Clipboard Toggle word wrap
      # mkdir -p /var/tmp/mongo_backup/
      # mongodump --oplog --host $MONGOIP --out /var/tmp/mongo_backup/
    3. Copy the database dump in /var/tmp/mongo_backup/ to a secure location.
  4. Backup the Redis cluster:

    1. Obtain the Redis endpoint from HAProxy:

      Copy to Clipboard Toggle word wrap
      # REDISIP=$(sudo hiera -c /etc/puppet/hiera.yaml redis_vip)
    2. Obtain the master password for the Redis cluster:

      Copy to Clipboard Toggle word wrap
      # REDISPASS=$(sudo hiera -c /etc/puppet/hiera.yaml redis::masterauth)
    3. Check connectivity to the Redis cluster:

      Copy to Clipboard Toggle word wrap
      # redis-cli -a $REDISPASS -h $REDISIP ping
    4. Dump the Redis database:

      Copy to Clipboard Toggle word wrap
      # redis-cli -a $REDISPASS -h $REDISIP bgsave

      This stores the database backup in the default /var/lib/redis/ directory. Copy this database dump to a secure location.

  5. Backup the filesystem on each Controller node:

    1. Create a directory for the backup:

      Copy to Clipboard Toggle word wrap
      # mkdir -p /var/tmp/filesystem_backup/
    2. Run the following tar command:

      Copy to Clipboard Toggle word wrap
      # tar --acls --ignore-failed-read --xattrs --xattrs-include='*.*' \
          -zcvf /var/tmp/filesystem_backup/`hostname`-filesystem-`date '+%Y-%m-%d-%H-%M-%S'`.tar \
          /etc \
          /srv/node \
          /var/log \
          /var/lib/nova \
          --exclude /var/lib/nova/instances \
          /var/lib/glance \
          /var/lib/keystone \
          /var/lib/cinder \
          /var/lib/heat \
          /var/lib/heat-config \
          /var/lib/heat-cfntools \
          /var/lib/rabbitmq \
          /var/lib/neutron \
          /var/lib/haproxy \
          /var/lib/openvswitch \
          /var/lib/redis \
          /var/lib/os-collect-config \
          /usr/libexec/os-apply-config \
          /usr/libexec/os-refresh-config \
          /home/heat-admin

      The --ignore-failed-read option ignores any missing directories, which is useful if certain services are not used or separated on their own custom roles.

    3. Copy the resulting tar file to a secure location.
  6. Archive deleted rows on the overcloud:

    1. Check for archived deleted instances:

      Copy to Clipboard Toggle word wrap
      $ source ~/overcloudrc
      $ nova list --all-tenants --deleted
    2. If there are no archived deleted instances, then archive the deleted instances by entering the following command on one of the overcloud Controller nodes:

      Copy to Clipboard Toggle word wrap
      # su - nova -s /bin/bash -c "nova-manage --debug db archive_deleted_rows --max_rows 1000"

      Rerun this command until you have archived all deleted instances.

    3. Purge all the archived deleted instances by entering the following command on one of the overcloud Controller nodes:

      Copy to Clipboard Toggle word wrap
      # su - nova -s /bin/bash -c "nova-manage --debug db purge --all --all-cells"
    4. Verify that there are no remaining archived deleted instances:

      Copy to Clipboard Toggle word wrap
      $ nova list --all-tenants --deleted

Related Information

2.3. Updating the current undercloud packages for OpenStack Platform 10.z

The director provides commands to update the packages on the undercloud node. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within OpenStack Platform 10.

Note

This step also updates the undercloud operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwitch to version 2.9.

Procedure

  1. Log in to the undercloud as the stack user.
  2. Stop the main OpenStack Platform services:

    Copy to Clipboard Toggle word wrap
    $ sudo systemctl stop 'openstack-*' 'neutron-*' httpd
    Note

    This causes a short period of downtime for the undercloud. The overcloud is still functional during the undercloud upgrade.

  3. Set the RHEL version to RHEL 7.7:

    Copy to Clipboard Toggle word wrap
    $ sudo subscription-manager release --set=7.7
  4. Update the python-tripleoclient package and its dependencies to ensure you have the latest scripts for the minor version update:

    Copy to Clipboard Toggle word wrap
    $ sudo yum update -y python-tripleoclient
  5. Run the openstack undercloud upgrade command:

    Copy to Clipboard Toggle word wrap
    $ openstack undercloud upgrade
  6. Wait until the command completes its execution.
  7. Reboot the undercloud to update the operating system’s kernel and other system packages:

    Copy to Clipboard Toggle word wrap
    $ sudo reboot
  8. Wait until the node boots.
  9. Log into the undercloud as the stack user.

In addition to undercloud package updates, it is recommended to keep your overcloud images up to date to keep the image configuration in sync with the latest openstack-tripleo-heat-template package. This ensures successful deployment and scaling operations in between the current preparation stage and the actual fast forward upgrade. The next section shows how to update your images in this scenario. If you aim to immediately upgrade your environment after preparing your environment, you can skip the next section.

2.4. Preparing updates for NFV-enabled environments

If your environment has network function virtualization (NFV) enabled, follow these steps after you update your undercloud, and before you update your overcloud.

Procedure

  1. Change the vhost user socket directory in a custom environment file, for example, network-environment.yaml:

    Copy to Clipboard Toggle word wrap
    parameter_defaults:
      NeutronVhostuserSocketDir: "/var/lib/vhost_sockets"
  2. Add the ovs-dpdk-permissions.yaml file to your openstack overcloud deploy command to configure the qemu group setting as hugetlbfs for OVS-DPDK:

    Copy to Clipboard Toggle word wrap
     -e environments/ovs-dpdk-permissions.yaml
  3. Ensure that vHost user ports for all instances are in dpdkvhostuserclient mode. For more information see Manually changing the vhost user port mode.

2.5. Updating the current overcloud images for OpenStack Platform 10.z

The undercloud update process might download new image archives from the rhosp-director-images and rhosp-director-images-ipa packages. This process updates these images on your undercloud within Red Hat OpenStack Platform 10.

Prerequisites

  • You have updated to the latest minor release of your current undercloud version.

Procedure

  1. Check the yum log to determine if new image archives are available:

    Copy to Clipboard Toggle word wrap
    $ sudo grep "rhosp-director-images" /var/log/yum.log
  2. If new archives are available, replace your current images with new images. To install the new images, first remove any existing images from the images directory on the stack user’s home (/home/stack/images):

    Copy to Clipboard Toggle word wrap
    $ rm -rf ~/images/*
  3. On the undercloud node, source the undercloud credentials:

    Copy to Clipboard Toggle word wrap
    $ source ~/stackrc
  4. Extract the archives:

    Copy to Clipboard Toggle word wrap
    $ cd ~/images
    $ for i in /usr/share/rhosp-director-images/overcloud-full-latest-10.0.tar /usr/share/rhosp-director-images/ironic-python-agent-latest-10.0.tar; do tar -xvf $i; done
  5. Import the latest images in to director and configure nodes to use the new images:

    Copy to Clipboard Toggle word wrap
    $ cd ~/images
    $ openstack overcloud image upload --update-existing --image-path /home/stack/images/
    $ openstack overcloud node configure $(openstack baremetal node list -c UUID -f csv --quote none | sed "1d" | paste -s -d " ")
  6. To finalize the image update, verify the existence of the new images:

    Copy to Clipboard Toggle word wrap
    $ openstack image list
    $ ls -l /httpboot

    Director also retains the old images and renames them using the timestamp of when they were updated. If you no longer need these images, delete them.

Director is now updated and using the latest images. You do not need to restart any services after the update.

The undercloud is now using updated OpenStack Platform 10 packages. Next, update the overcloud to the latest minor release.

2.6. Updating the current overcloud packages for OpenStack Platform 10.z

The director provides commands to update the packages on all overcloud nodes. This allows you to perform a minor update within the current version of your OpenStack Platform environment. This is a minor update within Red Hat OpenStack Platform 10.

Note

This step also updates the overcloud nodes' operating system to the latest version of Red Hat Enterprise Linux 7 and Open vSwitch to version 2.9.

Prerequisites

  • You have updated to the latest minor release of your current undercloud version.
  • You have performed a backup of the overcloud.

Procedure

  1. Check your subscription management configuration for the rhel_reg_release parameter. If this parameter is not set, you must include it and set it version 7.7:

    Copy to Clipboard Toggle word wrap
    parameter_defaults:
      ...
      rhel_reg_release: "7.7"

    Ensure that you save the changes to the overcloud subscription management environment file.

  2. Update the current plan using your original openstack overcloud deploy command and including the --update-plan-only option. For example:

    Copy to Clipboard Toggle word wrap
    $ openstack overcloud deploy --update-plan-only \
      --templates  \
      -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
      -e /home/stack/templates/network-environment.yaml \
      -e /home/stack/templates/storage-environment.yaml \
      -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \
      [-e <environment_file>|...]

    The --update-plan-only only updates the Overcloud plan stored in the director. Use the -e option to include environment files relevant to your Overcloud and its update path. The order of the environment files is important as the parameters and resources defined in subsequent environment files take precedence. Use the following list as an example of the environment file order:

    • Any network isolation files, including the initialization file (environments/network-isolation.yaml) from the heat template collection and then your custom NIC configuration file.
    • Any external load balancing environment files.
    • Any storage environment files.
    • Any environment files for Red Hat CDN or Satellite registration.
    • Any other custom environment files.
  3. Create a static inventory file of your overcloud:

    Copy to Clipboard Toggle word wrap
    $ tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory ~/inventory.yaml

    If you use an overcloud name different to the default overcloud name of overcloud, set the name of your overcloud with the --plan option.

  4. Create a playbook that contains a task to set the operating system version to Red Hat Enterprise Linux 7.7 on all nodes:

    Copy to Clipboard Toggle word wrap
    $ cat > ~/set_release.yaml <<'EOF'
    - hosts: all
      gather_facts: false
      tasks:
        - name: set release to 7.7
          command: subscription-manager release --set=7.7
          become: true
    EOF
  5. Run the set_release.yaml playbook:

    Copy to Clipboard Toggle word wrap
    $ ansible-playbook -i ~/inventory.yaml -f 25 ~/set_release.yaml --limit undercloud,Controller,Compute

    Use the --limit option to apply the content to all Red Hat OpenStack Platform nodes.

  6. Perform a package update on all nodes using the openstack overcloud update command:

    Copy to Clipboard Toggle word wrap
    $ openstack overcloud update stack -i overcloud

    The -i runs an interactive mode to update each node sequentially. When the update process completes a node update, the script provides a breakpoint for you to confirm. Without the -i option, the update remains paused at the first breakpoint. Therefore, it is mandatory to include the -i option.

    The script performs the following functions:

    1. The script runs on nodes one-by-one:

      1. For Controller nodes, this means a full package update.
      2. For other nodes, this means an update of Puppet modules only.
    2. Puppet runs on all nodes at once:

      1. For Controller nodes, the Puppet run synchronizes the configuration.
      2. For other nodes, the Puppet run updates the rest of the packages and synchronizes the configuration.
  7. The update process starts. During this process, the director reports an IN_PROGRESS status and periodically prompts you to clear breakpoints. For example:

    Copy to Clipboard Toggle word wrap
    starting package update on stack overcloud
    IN_PROGRESS
    IN_PROGRESS
    WAITING
    on_breakpoint: [u'overcloud-compute-0', u'overcloud-controller-2', u'overcloud-controller-1', u'overcloud-controller-0']
    Breakpoint reached, continue? Regexp or Enter=proceed (will clear 49913767-e2dd-4772-b648-81e198f5ed00), no=cancel update, C-c=quit interactive mode:

    Press Enter to clear the breakpoint from last node on the on_breakpoint list. This begins the update for that node.

  8. The script automatically predefines the update order of nodes:

    • Each Controller node individually
    • Each individual Compute node individually
    • Each Ceph Storage node individually
    • All other nodes individually

    It is recommended to use this order to ensure a successful update, specifically:

    1. Clear the breakpoint of each Controller node individually. Each Controller node requires an individual package update in case the node’s services must restart after the update. This reduces disruption to highly available services on other Controller nodes.
    2. After the Controller node update, clear the breakpoints for each Compute node. You can also type a Compute node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Compute nodes at once.
    3. Clear the breakpoints for each Ceph Storage nodes. You can also type a Ceph Storage node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple Ceph Storage nodes at once.
    4. Clear any remaining breakpoints to update the remaining nodes. You can also type a node name to clear a breakpoint on a specific node or use a Python-based regular expression to clear breakpoints on multiple nodes at once.
    5. Wait until all nodes have completed their update.
  9. The update command reports a COMPLETE status when the update completes:

    Copy to Clipboard Toggle word wrap
    ...
    IN_PROGRESS
    IN_PROGRESS
    IN_PROGRESS
    COMPLETE
    update finished with status COMPLETE
  10. If you configured fencing for your Controller nodes, the update process might disable it. When the update process completes, re-enable fencing with the following command on one of the Controller nodes:

    Copy to Clipboard Toggle word wrap
    $ sudo pcs property set stonith-enabled=true

The update process does not reboot any nodes in the Overcloud automatically. Updates to the kernel and other system packages require a reboot. Check the /var/log/yum.log file on each node to see if either the kernel or openvswitch packages have updated their major or minor versions. If they have, reboot each node using the following procedures.

2.7. Rebooting controller and composable nodes

The following procedure reboots controller nodes and standalone nodes based on composable roles. This excludes Compute nodes and Ceph Storage nodes.

Procedure

  1. Log in to the node that you want to reboot.
  2. Optional: If the node uses Pacemaker resources, stop the cluster:

    Copy to Clipboard Toggle word wrap
    [heat-admin@overcloud-controller-0 ~]$ sudo pcs cluster stop
  3. Reboot the node:

    Copy to Clipboard Toggle word wrap
    [heat-admin@overcloud-controller-0 ~]$ sudo reboot
  4. Wait until the node boots.
  5. Check the services. For example:

    1. If the node uses Pacemaker services, check that the node has rejoined the cluster:

      Copy to Clipboard Toggle word wrap
      [heat-admin@overcloud-controller-0 ~]$ sudo pcs status
    2. If the node uses Systemd services, check that all services are enabled:

      Copy to Clipboard Toggle word wrap
      [heat-admin@overcloud-controller-0 ~]$ sudo systemctl status
    3. Repeat these steps for all Controller and composable nodes.

2.8. Rebooting a Ceph Storage (OSD) cluster

The following procedure reboots a cluster of Ceph Storage (OSD) nodes.

Procedure

  1. Log in to a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:

    Copy to Clipboard Toggle word wrap
    $ sudo ceph osd set noout
    $ sudo ceph osd set norebalance
  2. Select the first Ceph Storage node to reboot and log into it.
  3. Reboot the node:

    Copy to Clipboard Toggle word wrap
    $ sudo reboot
  4. Wait until the node boots.
  5. Log in to a Ceph MON or Controller node and check the cluster status:

    Copy to Clipboard Toggle word wrap
    $ sudo ceph -s

    Check that the pgmap reports all pgs as normal (active+clean).

  6. Log out of the Ceph MON or Controller node, reboot the next Ceph Storage node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
  7. When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:

    Copy to Clipboard Toggle word wrap
    $ sudo ceph osd unset noout
    $ sudo ceph osd unset norebalance
  8. Perform a final status check to verify the cluster reports HEALTH_OK:

    Copy to Clipboard Toggle word wrap
    $ sudo ceph status

2.9. Rebooting Compute nodes

Rebooting a Compute node involves the following workflow:

  • Select a Compute node to reboot and disable it so that it does not provision new instances.
  • Migrate the instances to another Compute node to minimise instance downtime.
  • Reboot the empty Compute node and enable it.

Procedure

  1. Log in to the undercloud as the stack user.
  2. To identify the Compute node that you intend to reboot, list all Compute nodes:

    Copy to Clipboard Toggle word wrap
    $ source ~/stackrc
    (undercloud) $ openstack server list --name compute
  3. From the overcloud, select a Compute Node and disable it:

    Copy to Clipboard Toggle word wrap
    $ source ~/overcloudrc
    (overcloud) $ openstack compute service list
    (overcloud) $ openstack compute service set <hostname> nova-compute --disable
  4. List all instances on the Compute node:

    Copy to Clipboard Toggle word wrap
    (overcloud) $ openstack server list --host <hostname> --all-projects
  5. Migrate your instances. For more information on migration strategies, see Migrating virtual machines between Compute nodes.
  6. Log into the Compute Node and reboot it:

    Copy to Clipboard Toggle word wrap
    [heat-admin@overcloud-compute-0 ~]$ sudo reboot
  7. Wait until the node boots.
  8. Enable the Compute node:

    Copy to Clipboard Toggle word wrap
    $ source ~/overcloudrc
    (overcloud) $ openstack compute service set <hostname> nova-compute --enable
  9. Verify that the Compute node is enabled:

    Copy to Clipboard Toggle word wrap
    (overcloud) $ openstack compute service list

2.10. Verifying system packages

Before the upgrade, the undercloud node and all overcloud nodes should be using the latest versions of the following packages:

PackageVersion

openvswitch

At least 2.9

qemu-img-rhev

At least 2.10

qemu-kvm-common-rhev

At least 2.10

qemu-kvm-rhev

At least 2.10

qemu-kvm-tools-rhev

At least 2.10

Procedure

  1. Log into a node.
  2. Run yum to check the system packages:

    Copy to Clipboard Toggle word wrap
    $ sudo yum list qemu-img-rhev qemu-kvm-common-rhev qemu-kvm-rhev qemu-kvm-tools-rhev openvswitch
  3. Run ovs-vsctl to check the version currently running:

    Copy to Clipboard Toggle word wrap
    $ sudo ovs-vsctl --version
  4. Repeat these steps for all nodes.

The undercloud is now uses updated OpenStack Platform 10 packages. Use the next few procedures to check the system is in a working state.

2.11. Validating an OpenStack Platform 10 undercloud

The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 undercloud before an upgrade.

Procedure

  1. Source the undercloud access details:

    Copy to Clipboard Toggle word wrap
    $ source ~/stackrc
  2. Check for failed Systemd services:

    Copy to Clipboard Toggle word wrap
    $ sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker'
  3. Check the undercloud free space:

    Copy to Clipboard Toggle word wrap
    $ df -h

    Use the "Undercloud Requirements" as a basis to determine if you have adequate free space.

  4. If you have NTP installed on the undercloud, check the clock is synchronized:

    Copy to Clipboard Toggle word wrap
    $ sudo ntpstat
  5. Check the undercloud network services:

    Copy to Clipboard Toggle word wrap
    $ openstack network agent list

    All agents should be Alive and their state should be UP.

  6. Check the undercloud compute services:

    Copy to Clipboard Toggle word wrap
    $ openstack compute service list

    All agents' status should be enabled and their state should be up

Related Information

2.12. Validating an OpenStack Platform 10 overcloud

The following is a set of steps to check the functionality of your Red Hat OpenStack Platform 10 overcloud before an upgrade.

Procedure

  1. Source the undercloud access details:

    Copy to Clipboard Toggle word wrap
    $ source ~/stackrc
  2. Check the status of your bare metal nodes:

    Copy to Clipboard Toggle word wrap
    $ openstack baremetal node list

    All nodes should have a valid power state (on) and maintenance mode should be false.

  3. Check for failed Systemd services:

    Copy to Clipboard Toggle word wrap
    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo systemctl list-units --state=failed 'openstack*' 'neutron*' 'httpd' 'docker' 'ceph*'" ; done
  4. Check the HAProxy connection to all services. Obtain the Control Plane VIP address and authentication information for the haproxy.stats service:

    Copy to Clipboard Toggle word wrap
    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE sudo 'grep "listen haproxy.stats" -A 6 /etc/haproxy/haproxy.cfg'
  5. Use the connection and authentication information obtained from the previous step to check the connection status of RHOSP services.

    If SSL is not enabled, use these details in the following cURL request:

    Copy to Clipboard Toggle word wrap
    $ curl -s -u admin:<PASSWORD> "http://<IP ADDRESS>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'

    If SSL is enabled, use these details in the following cURL request:

    Copy to Clipboard Toggle word wrap
    curl -s -u admin:<PASSWORD> "https://<HOSTNAME>:1993/;csv" | egrep -vi "(frontend|backend)" | awk -F',' '{ print $1" "$2" "$18 }'

    Replace the <PASSWORD> and <IP ADDRESS> or <HOSTNAME> values with the respective information from the haproxy.stats service. The resulting list shows the OpenStack Platform services on each node and their connection status.

  6. Check overcloud database replication health:

    Copy to Clipboard Toggle word wrap
    $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo clustercheck" ; done
  7. Check RabbitMQ cluster health:

    Copy to Clipboard Toggle word wrap
    $ for NODE in $(openstack server list --name controller -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo rabbitmqctl node_health_check" ; done
  8. Check Pacemaker resource health:

    Copy to Clipboard Toggle word wrap
    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo pcs status"

    Look for:

    • All cluster nodes online.
    • No resources stopped on any cluster nodes.
    • No failed pacemaker actions.
  9. Check the disk space on each overcloud node:

    Copy to Clipboard Toggle word wrap
    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo df -h --output=source,fstype,avail -x overlay -x tmpfs -x devtmpfs" ; done
  10. Check overcloud Ceph Storage cluster health. The following command runs the ceph tool on a Controller node to check the cluster:

    Copy to Clipboard Toggle word wrap
    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph -s"
  11. Check Ceph Storage OSD for free space. The following command runs the ceph tool on a Controller node to check the free space:

    Copy to Clipboard Toggle word wrap
    $ NODE=$(openstack server list --name controller-0 -f value -c Networks | cut -d= -f2); ssh heat-admin@$NODE "sudo ceph df"
    Important

    The number of placement groups (PGs) for each Ceph object storage daemon (OSD) must not exceed 250 by default. Upgrading Ceph nodes with more PGs per OSD results in a warning state and might fail the upgrade process. You can increase the number of PGs per OSD before you start the upgrade process. For more information about diagnosing and troubleshooting this issue, see the article OpenStack FFU from 10 to 13 times out because Ceph PGs allocated in one or more OSDs is higher than 250.

  12. Check that clocks are synchronized on overcloud nodes

    Copy to Clipboard Toggle word wrap
    $ for NODE in $(openstack server list -f value -c Networks | cut -d= -f2); do echo "=== $NODE ===" ; ssh heat-admin@$NODE "sudo ntpstat" ; done
  13. Source the overcloud access details:

    Copy to Clipboard Toggle word wrap
    $ source ~/overcloudrc
  14. Check the overcloud network services:

    Copy to Clipboard Toggle word wrap
    $ openstack network agent list

    All agents should be Alive and their state should be UP.

  15. Check the overcloud compute services:

    Copy to Clipboard Toggle word wrap
    $ openstack compute service list

    All agents' status should be enabled and their state should be up

  16. Check the overcloud volume services:

    Copy to Clipboard Toggle word wrap
    $ openstack volume service list

    All agents' status should be enabled and their state should be up.

Related Information

2.13. Finalizing updates for NFV-enabled environments

If your environment has network function virtualization (NFV) enabled, you need to follow these steps after updating your undercloud and overcloud.

Procedure

You need to migrate your existing OVS-DPDK instances to ensure that the vhost socket mode changes from dkdpvhostuser to dkdpvhostuserclient mode in the OVS ports. We recommend that you snapshot existing instances and rebuild a new instance based on that snapshot image. See Manage Instance Snapshots for complete details on instance snapshots.

To snapshot an instance and boot a new instance from the snapshot:

  1. Source the overcloud access details:

    Copy to Clipboard Toggle word wrap
    $ source ~/overcloudrc
  2. Find the server ID for the instance you want to take a snapshot of:

    Copy to Clipboard Toggle word wrap
    $ openstack server list
  3. Shut down the source instance before you take the snapshot to ensure that all data is flushed to disk:

    Copy to Clipboard Toggle word wrap
    $ openstack server stop SERVER_ID
  4. Create the snapshot image of the instance:

    Copy to Clipboard Toggle word wrap
    $ openstack image create --id SERVER_ID SNAPSHOT_NAME
  5. Boot a new instance with this snapshot image:

    Copy to Clipboard Toggle word wrap
    $ openstack server create --flavor DPDK_FLAVOR --nic net-id=DPDK_NET_ID--image SNAPSHOT_NAME INSTANCE_NAME
  6. Optionally, verify that the new instance status is ACTIVE:

    Copy to Clipboard Toggle word wrap
    $ openstack server list

Repeat this procedure for all instances that you need to snapshot and relaunch.

2.14. Retaining YUM history

After completing a minor update of the overcloud, retain the yum history. This information is useful to have in case you need to undo yum transaction for any possible rollback operations.

Procedure

  1. On each node, run the following command to save the entire yum history of the node in a file:

    Copy to Clipboard Toggle word wrap
    $ sudo yum history list all > /home/heat-admin/$(hostname)-yum-history-all
  2. On each node, run the following command to save the ID of the last yum history item:

    Copy to Clipboard Toggle word wrap
    $ sudo yum history list all | head -n 5 | tail -n 1 | awk '{print $1}' > /home/heat-admin/$(hostname)-yum-history-all-last-id
  3. Copy these files to a secure location.

2.15. Next Steps

With the preparation stage complete, you can now perform an upgrade of the undercloud from 10 to 13 using the steps in Chapter 3, Upgrading the undercloud.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat, Inc.