Home
Products
Red Hat OpenStack Services on OpenShift
18.0
Configuring the Compute service for instance creation
Chapter 17. Migrating virtual machine instances between Compute nodes

Chapter 17. Migrating virtual machine instances between Compute nodes

You might need to migrate instances from one Compute node to another Compute node in the data plane, to perform maintenance, rebalance the workload, or replace a failed or failing node.

Compute node maintenance: If you need to temporarily take a Compute node out of service, for instance, to perform hardware maintenance or repair, kernel upgrades and software updates, you can migrate instances running on the Compute node to another Compute node.
Failing Compute node: If a Compute node is about to fail and you need to service it or replace it, you can migrate instances from the failing Compute node to a healthy Compute node.
Failed Compute nodes: If a Compute node has already failed, you can evacuate the instances. You can rebuild instances from the original image on another Compute node, using the same name, UUID, network addresses, and any other allocated resources the instance had before the Compute node failed.
Workload rebalancing: You can migrate one or more instances to another Compute node to rebalance the workload. For example, you can consolidate instances on a Compute node to conserve power, migrate instances to a Compute node that is physically closer to other networked resources to reduce latency, or distribute instances across Compute nodes to avoid hot spots and increase resiliency.

All Compute nodes provide secure migration. All Compute nodes also require a shared SSH key to provide the users of each host with access to other Compute nodes during the migration process.

17.1. Migration types
Copy link

Red Hat OpenStack Services on OpenShift (RHOSO) supports the following types of migration.

Cold migration

Cold migration, or non-live migration, involves shutting down a running instance before migrating it from the source Compute node to the destination Compute node.

Cold migration involves some downtime for the instance. The migrated instance maintains access to the same volumes and IP addresses.

Note

Cold migration requires that both the source and destination Compute nodes are running.

Live migration

Live migration involves moving the instance from the source Compute node to the destination Compute node without shutting it down, and while maintaining state consistency.

Live migrating an instance involves little or no perceptible downtime. However, live migration does impact performance for the duration of the migration operation. Therefore, instances should be taken out of the critical path while being migrated.

Important

Live migration impacts the performance of the workload being moved. Red Hat does not provide support for increased packet loss, network latency, memory latency or a reduction in network bandwith, memory bandwidth, storage IO, or CPU peformance during live migration.

Note

Live migration requires that both the source and destination Compute nodes are running.

Evacuation: If you need to migrate instances because the source Compute node has already failed, you can evacuate the instances.

17.2. Migration constraints
Copy link

Migration constraints typically arise with block migration, configuration disks, or when one or more instances access physical hardware on the Compute node.

CPU constraints

The source and destination Compute nodes must have the same CPU architecture. For example, Red Hat does not support migrating an instance from a ppc64le CPU to a x86_64 CPU.

Migration between different CPU models is not supported. In some cases, the CPU of the source and destination Compute node must match exactly, such as instances that use CPU host passthrough. In all cases, the CPU features of the destination node must be a superset of the CPU features on the source node.

Memory constraints

The destination Compute node must have sufficient available RAM. Memory oversubscription can cause migration to fail.

Block migration constraints

Migrating instances that use disks that are stored locally on a Compute node takes significantly longer than migrating volume-backed instances that use shared storage, such as Red Hat Ceph Storage. This latency arises because OpenStack Compute (nova) migrates local disks block-by-block between the Compute nodes over the control plane network by default. By contrast, volume-backed instances that use shared storage, such as Red Hat Ceph Storage, do not have to migrate the volumes, because each Compute node already has access to the shared storage.

Note

Network congestion in the control plane network caused by migrating local disks or instances that consume large amounts of RAM might impact the performance of other systems that use the control plane network, such as RabbitMQ.

Read-only drive migration constraints

Migrating a drive is supported only if the drive has both read and write capabilities. For example, OpenStack Compute (nova) cannot migrate a CD-ROM drive or a read-only config drive. However, OpenStack Compute (nova) can migrate a drive with both read and write capabilities. The config drive vfat is the only config drive format that OpenStack Compute (nova) can migrate.

Live migration constraints

In some cases, live migrating instances involves additional constraints.

Important

Live migration impacts the performance of the workload being moved. Red Hat does not provide support for increased packet loss, network latency, memory latency or a reduction in network bandwidth, memory bandwidth, storage IO, or CPU performance during live migration.

No new operations during migration
To achieve state consistency between the copies of the instance on the source and destination nodes, RHOSP must prevent new operations during live migration. Otherwise, live migration might take a long time or potentially never end if writes to memory occur faster than live migration can replicate the state of the memory.
CPU pinning with NUMA
The enabled_filters parameter in the [filter_scheduler] configuration of the OpenStackControlPlane customer resource (CR) must include the values AggregateInstanceExtraSpecsFilter and NUMATopologyFilter.
Multi-cell clouds
In a multi-cell cloud, you can live migrate instances to a different host in the same cell, but not across cells.
Floating instances
When you migrate shared (floating) instances, the value of the cpu_shared_set field between the destination and source Compute nodes must match, so that the instances are allocated to CPUs configured for shared (unpinned) instances at the destination. Therefore, if you need to live migrate floating instances, ensure that all the Compute nodes have the same CPU mappings for dedicated (pinned) and shared instances, or use a host aggregate for the shared instances.
Destination Compute node capacity
The destination Compute node must have sufficient capacity to host the instance that you want to migrate.
SR-IOV live migration
Instances with SR-IOV-based network interfaces can be live migrated. Live migrating instances with direct mode SR-IOV network interfaces incurs network downtime. This is because the direct mode interfaces need to be detached and re-attached during the migration.
Live migration on ML2/OVS deployments
During the live migration process, when the virtual machine is unpaused in the destination host, the metadata service might not be available because the metadata server proxy has not yet spawned. This unavailability is brief. The service becomes available momentarily and the live migration succeeds.
Constraints that preclude live migration
You cannot live migrate an instance that uses the following features.
PCI passthrough
QEMU/KVM hypervisors support attaching PCI devices on the Compute node to an instance. Use PCI passthrough to give an instance exclusive access to PCI devices, which appear and behave as if they are physically attached to the operating system of the instance. However, because PCI passthrough involves direct access to the physical devices, QEMU/KVM does not support live migration of instances using PCI passthrough.

17.3. Preparing to migrate
Copy link

Before you migrate one or more instances, you need to determine the Compute node names and the IDs of the instances to migrate.

Prerequisites

You are logged on to a workstation that has access to the RHOSO control plane as a user with cluster-admin privileges.
The oc command line tool is installed on the workstation.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
List the instances on the source Compute node and locate the ID of the instance or instances that you want to migrate:
```
 $ openstack server list --host <source> --all-projects
```
Replace <source> with the name or ID of the source Compute node.
Optional: If you are migrating instances from a source Compute node to perform maintenance on the node, you must disable the node to prevent the scheduler from assigning new instances to the node during maintenance:
```
 $ openstack compute service set <source> nova-compute --disable
```
Replace <source> with the host name of the source Compute node.
Exit the OpenStackClient pod:
```
$ exit
```

Next step

You are now ready to perform the migration. Follow the required procedure detailed in Cold migrating an instance or Live migrating an instance.

17.4. Cold migrating an instance
Copy link

Cold migrating an instance involves stopping the instance and moving it to another Compute node. Cold migration facilitates migration scenarios that live migrating cannot facilitate, such as migrating instances that use PCI passthrough.

The scheduler automatically selects the destination Compute node. For more information, see Migration constraints.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
To cold migrate an instance, enter the following command to power off and move the instance:
```
 $ openstack server migrate <instance> --wait
```
- Replace <instance> with the name or ID of the instance to migrate.
- Specify the --block-migration flag if migrating a locally stored volume.
- Specify the --wait flag to indicate that you must wait for the migration to complete.
While you wait for the instance migration to complete, you can open another terminal window and check the migration status. For more information, see Checking migration status.
Check the status of the instance:
```
 $ openstack server list --all-projects
```
A status of "VERIFY_RESIZE" indicates you need to confirm or revert the migration:
- If the migration worked as expected, confirm it:
  $ openstack server resize --confirm <instance>
  Replace <instance> with the name or ID of the instance to migrate. A status of "ACTIVE" indicates that the instance is ready to use.
- If the migration did not work as expected, revert it:
  $ openstack server resize --revert <instance>
  Replace <instance> with the name or ID of the instance.
Restart the instance:
```
 $ openstack server start <instance>
```
Replace <instance> with the name or ID of the instance.
Optional: If you disabled the source Compute node for maintenance, you must re-enable the node so that new instances can be assigned to it:
```
 $ openstack compute service set <source> nova-compute --enable
```
Replace <source> with the hostname of the source Compute node.
Exit the OpenStackClient pod:
```
$ exit
```

17.5. Live migrating an instance
Copy link

Live migration moves an instance from a source Compute node to a destination Compute node with a minimal amount of downtime. Live migration might not be appropriate for all instances. For more information, see Migration constraints.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
To live migrate an instance, specify the instance and the destination Compute node:
```
 $ openstack server migrate <instance> --live-migration [--host <dest>] --wait
```
- Replace <instance> with the name or ID of the instance.
- Replace <dest> with the name or ID of the destination Compute node.
  Note
  The openstack server migrate command covers migrating instances with shared storage, which is the default. Specify the --block-migration flag to migrate a locally stored volume:
  
  $ openstack server migrate <instance> --live-migration [--host <dest>] --wait --block-migration

Confirm that the instance is migrating:

 $ openstack server show <instance>

+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| ...                  | ...                                  |
| status               | MIGRATING                            |
| ...                  | ...                                  |
+----------------------+--------------------------------------+

Wait for migration to complete. While you wait for the instance migration to complete, you can check the migration status. For more information, see Checking migration status.
Check the status of the instance to confirm if the migration was successful:
```
 $ openstack server list --host <dest> --all-projects
```
Replace <dest> with the name or ID of the destination Compute node.
Optional: If you disabled the source Compute node for maintenance, you must re-enable the node so that new instances can be assigned to it:
```
 $ openstack compute service set <source> nova-compute --enable
```
Replace <source> with the host name of the source Compute node.
Exit the OpenStackClient pod:
```
$ exit
```

17.6. Enabling parallel connections for live migrations
Copy link

By default, live migration uses one network connection to transfer an instance to a destination Compute node. Factors such as single-threaded TLS speed can limit performance. To increase migration speed, enable multiple connections per live migration. This increases CPU usage.

You must disable the live_migration_permit_post_copy parameter to use more than one connection, and configure the live_migration_parallel_connections parameter with the appropriate values for your environment. While the default value is 1, increase the parameter by small increments, for example, two at a time, with testing.

Prerequisites

You have the oc command line tool installed on your workstation.
You are logged on to a workstation that has access to the RHOSO control plane as a user with cluster-admin privileges.
You have selected the OpenStackDataPlaneNodeSet CR that you want to configure for parallel connections for live migrations. For more information about creating an OpenStackDataPlaneNodeSet CR, see Creating the data plane in Deploying Red Hat OpenStack Services on OpenShift.
Ensure that you have installed QEMU version 10.1.0 or later.

Procedure

Create or update the ConfigMap CR named nova-extra-config and set the value of [libvirt] live_migration_parallel_connections to a value greater than 1, for example 4, and set [libvirt]live_migration_permit_post_copy to false:
```
apiVersion: v1
kind: ConfigMap
Metadata:
    name: nova-extra-config
    namespace: openstack
data:
    42-nova-parallel-connections.conf: |
      [notifications]
      live_migration_parallel_connections = 4
      live_migration_permit_post_copy = false
```
For more information about creating ConfigMap objects, see Creating and using config maps in Nodes.
Create a new OpenStackDataPlaneDeployment CR to configure the services on the data plane nodes and deploy the data plane, and save it to a file named compute-parallel-connections_deploy.yaml on your workstation:
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-parallel-connections
```
For more information about creating an OpenStackDataPlaneDeployment CR, see Deploying the data plane in Deploying Red Hat OpenStack Services on OpenShift.
In the compute-parallel-connections_deploy.yaml, specify nodeSets to include all the OpenStackDataPlaneNodeSet CRs you want to deploy. Ensure that you include the OpenStackDataPlaneNodeSet CR that you selected as a prerequisite. That OpenStackDataPlaneNodeSet CR defines the nodes you want to designate for parallel connections.
Warning
If your deployment has more than one node set, changes to the nova-extra-config.yaml ConfigMap might directly affect more than one node set, depending on how the node sets and the DataPlaneServices are configured. To check if a node set uses the nova-extra-config ConfigMap and therefore will be affected by the reconfiguration, complete the following steps:
1. Check the services list of the node set and find the name of the DataPlaneService that points to nova.
2. Ensure that the value of the edpmServiceType field of the DataPlaneService is set to nova.
  If the dataSources list of the DataPlaneService contains a configMapRef named nova-extra-config, then this node set uses this ConfigMap and therefore will be affected by the configuration changes in this ConfigMap. If some of the node sets that are affected should not be reconfigured, you must create a new DataPlaneService pointing to a separate ConfigMap for these node sets.
```
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: compute-parallel-connections
spec:
  nodeSets:
    - openstack-edpm
    - compute-parallel-connections
    - ...
    - <nodeSet_name>
```
- Replace <nodeSet_name> with the names of the OpenStackDataPlaneNodeSet CRs that you want to include in your data plane deployment.
Save the compute-parallel-connections_deploy.yaml deployment file.

Deploy the data plane:

$ oc create -f compute-parallel-connections_deploy.yaml

Verify that the data plane is deployed:

$ oc get openstackdataplanenodeset
NAME           STATUS MESSAGE
compute-parallel-connections True   Deployed

Access the remote shell for openstackclient and verify that the deployed Compute nodes are visible on the control plane:
```
$ oc rsh -n openstack openstackclient
$ openstack resource provider list
```

17.7. Checking migration status
Copy link

Instance migration moves through a sequence of five states — queued, preparing, running, post-migrating, and completed — to transition an instance and manage resource releases between Compute nodes.

Queued: The Compute service has accepted the request to migrate an instance, and migration is pending.
Preparing: The Compute service is preparing to migrate the instance.
Running: The Compute service is migrating the instance.
Post-migrating: The Compute service has built the instance on the destination Compute node and is releasing resources on the source Compute node.
Completed: The Compute service has completed migrating the instance and finished releasing resources on the source Compute node.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```

Retrieve the list of migration IDs for the instance:

$ openstack server migration list --server <instance>
+----+-------------+-----------  (...)
| Id | Source Node | Dest Node | (...)
+----+-------------+-----------+ (...)
| 2  | -           | -         | (...)
+----+-------------+-----------+ (...)

Replace <instance> with the name or ID of the instance.

Show the status of the migration:

$ openstack server migration show <instance> <migration_id>

Replace <instance> with the name or ID of the instance.

Replace <migration_id> with the ID of the migration.

Running the openstack server migration show command returns the following example output:

+------------------------+--------------------------------------+
| Property               | Value                                |
+------------------------+--------------------------------------+
| created_at             | 2017-03-08T02:53:06.000000           |
| dest_compute           | controller                           |
| dest_host              | -                                    |
| dest_node              | -                                    |
| disk_processed_bytes   | 0                                    |
| disk_remaining_bytes   | 0                                    |
| disk_total_bytes       | 0                                    |
| id                     | 2                                    |
| memory_processed_bytes | 65502513                             |
| memory_remaining_bytes | 786427904                            |
| memory_total_bytes     | 1091379200                           |
| server_uuid            | d1df1b5a-70c4-4fed-98b7-423362f2c47c |
| source_compute         | compute2                             |
| source_node            | -                                    |
| status                 | running                              |
| updated_at             | 2017-03-08T02:53:47.000000           |
+------------------------+--------------------------------------+

Tip

The Compute service measures progress of the migration by the number of remaining memory bytes to copy. If this number does not decrease over time, the migration might be unable to complete, and the Compute service might abort it.

Exit the OpenStackClient pod:
```
$ exit
```
Sometimes instance migration can take a long time or encounter errors. For more information, see Troubleshooting migration.

17.8. Evacuating an instance
Copy link

If you want to move an instance from a failed or shut-down Compute node to a new host in the same environment, you can evacuate it.

The evacuate process destroys the original instance and rebuilds it on another Compute node using the original image, instance name, UUID, network addresses, and any other resources the original instance had allocated to it.

If the instance uses shared storage, the instance root disk is not rebuilt during the evacuate process, as the disk remains accessible by the destination Compute node. If the instance does not use shared storage, then the instance root disk is also rebuilt on the destination Compute node.

Note

You can only perform an evacuation when the Compute node is fenced, and the API reports that the state of the Compute node is "down" or "forced-down". If the Compute node is not reported as "down" or "forced-down", the evacuate command fails.
To perform an evacuation, you must be a cloud administrator.

17.8.1. Evacuating an instance
Copy link

To evacuate all instances on a host, you must evacuate them one at a time.

Note

After you evacuate an instance, the default state of the instance is STOPPED.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
Confirm that the instance is not running:
```
 $ openstack server list --host <node> --all-projects
```
- Replace <node> with the name or UUID of the Compute node that hosts the instance.

Check the instance task state:

$ openstack server show <instance>
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| ...                  | ...                                  |
| status               | NONE                            |
| ...                  | ...                                  |
+----------------------+--------------------------------------+

Replace <instance> with the name or UUID of the instance that you want to evacuate.

Note

If the instance task state is not "NONE" the evacuation might fail.

Confirm that the host Compute node is fenced or shut down:
```
$ openstack baremetal node show <node>
```
- Replace <node> with the name or UUID of the Compute node that hosts the instance to evacuate. To perform an evacuation, the Compute node must have a status of down or forced-down.
Disable the Compute node:
```
$ openstack compute service set \
 <node> nova-compute --disable --disable-reason <disable_host_reason>
```
- Replace <node> with the name of the Compute node to evacuate the instance from.
- Replace <disable_host_reason> with details about why you disabled the Compute node.
Evacuate the instance:
```
$ openstack server evacuate [--host <dest>] \
 [--password <password>] <instance>
```
- Optional: Replace <dest> with the name of the Compute node to evacuate the instance to. If you do not specify the destination Compute node, the Compute scheduler selects one for you. You can find possible Compute nodes by using the following command:
  $ openstack hypervisor list
- Optional: Replace <password> with the administrative password required to access the evacuated instance. If a password is not specified, a random password is generated and output when the evacuation is complete.
  Note
  The password is changed only when ephemeral instance disks are stored on the local hypervisor disk. The password is not changed if the instance is hosted on shared storage or has a Block Storage volume attached, and no error message is displayed to inform you that the password was not changed.
- Replace <instance> with the name or ID of the instance to evacuate.
Note
If the evacuation fails and the task state of the instance is not "NONE", contact Red Hat Support for help to recover the instance.
Optional: Enable the Compute node when it is recovered:
```
$ openstack compute service set \
 <node> nova-compute --enable
```
- Replace <node> with the name of the Compute node to enable.
Exit the OpenStackClient pod:
```
$ exit
```

17.9. Troubleshooting migration
Copy link

Instance migration issues include migration process errors, indefinite execution, and post-migration performance degradation.

Errors during migration

The following issues can send the migration operation into an error state:

The Compute service is shutting down.
A race condition occurs.

When live migration enters a failed state, it is typically followed by an error state. The following common issues can cause a failed state:

A destination Compute host is not available.
A scheduler exception occurs.
The rebuild process fails due to insufficient computing resources.
A server group check fails.
The instance on the source Compute node gets deleted before migration to the destination Compute node is complete.

Never-ending live migration

Live migrations are left in a perpetual running state when they fail to complete, which can occur when requests from the guest OS to the instance running on the source Compute node create changes that occur faster than the Compute service can replicate them to the destination Compute node.

Use one of the following methods to address this situation:

Abort the live migration.
Force the live migration to complete.

17.9.1. Aborting live migration
Copy link

If the instance state changes faster than the migration procedure can copy it to the destination node, and you do not want to temporarily suspend the instance operations, you can abort the live migration.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
Retrieve the list of migrations for the instance:
```
$ openstack server migration list --server <instance>
```
Replace <instance> with the name or ID of the instance.
Abort the live migration:
```
$ openstack server migration abort <instance> <migration_id>
```
- Replace <instance> with the name or ID of the instance.
- Replace <migration_id> with the ID of the migration.
Exit the OpenStackClient pod:
```
$ exit
```

17.9.2. Forcing live migration to complete
Copy link

If the instance state changes faster than the migration procedure can copy it to the destination node, and you want to temporarily suspend the instance operations to force migration to complete, you can force the live migration procedure to complete.

Important

Forcing live migration to complete might lead to perceptible downtime.

Procedure

Access the remote shell for the OpenStackClient pod from your workstation:
```
$ oc rsh -n openstack openstackclient
```
Retrieve the list of migrations for the instance:
```
$ openstack server migration list --server <instance>
```
Replace <instance> with the name or ID of the instance.
Force the live migration to complete:
```
$ openstack server migration force complete <instance> <migration_id>
```
- Replace <instance> with the name or ID of the instance.
- Replace <migration_id> with the ID of the migration.
Exit the OpenStackClient pod:
```
$ exit
```

Chapter 17. Migrating virtual machine instances between Compute nodes

17.1. Migration types
Copy link

17.2. Migration constraints
Copy link

17.3. Preparing to migrate
Copy link

17.4. Cold migrating an instance
Copy link

17.5. Live migrating an instance
Copy link

17.6. Enabling parallel connections for live migrations
Copy link

17.7. Checking migration status
Copy link

17.8. Evacuating an instance
Copy link

17.8.1. Evacuating an instance
Copy link

17.9. Troubleshooting migration
Copy link

17.9.1. Aborting live migration
Copy link

17.9.2. Forcing live migration to complete
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 17. Migrating virtual machine instances between Compute nodes

17.1. Migration typesCopy linkLink copied to clipboard!

17.2. Migration constraintsCopy linkLink copied to clipboard!

17.3. Preparing to migrateCopy linkLink copied to clipboard!

17.4. Cold migrating an instanceCopy linkLink copied to clipboard!

17.5. Live migrating an instanceCopy linkLink copied to clipboard!

17.6. Enabling parallel connections for live migrationsCopy linkLink copied to clipboard!

17.7. Checking migration statusCopy linkLink copied to clipboard!

17.8. Evacuating an instanceCopy linkLink copied to clipboard!

17.8.1. Evacuating an instanceCopy linkLink copied to clipboard!

17.9. Troubleshooting migrationCopy linkLink copied to clipboard!

17.9.1. Aborting live migrationCopy linkLink copied to clipboard!

17.9.2. Forcing live migration to completeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

17.1. Migration types
Copy link

17.2. Migration constraints
Copy link

17.3. Preparing to migrate
Copy link

17.4. Cold migrating an instance
Copy link

17.5. Live migrating an instance
Copy link

17.6. Enabling parallel connections for live migrations
Copy link

17.7. Checking migration status
Copy link

17.8. Evacuating an instance
Copy link

17.8.1. Evacuating an instance
Copy link

17.9. Troubleshooting migration
Copy link

17.9.1. Aborting live migration
Copy link

17.9.2. Forcing live migration to complete
Copy link