Chapter 10. Live migration
10.1. About live migration
Live migration is the process of moving a running virtual machine (VM) to another node in the cluster without interrupting the virtual workload. Live migration enables smooth transitions during cluster upgrades or any time a node needs to be drained for maintenance or configuration changes.
By default, live migration traffic is encrypted using Transport Layer Security (TLS).
10.1.1. Live migration requirements
Live migration has the following requirements:
-
The cluster must have shared storage with
ReadWriteMany
(RWX) access mode. The cluster must have sufficient RAM and network bandwidth.
NoteYou must ensure that there is enough memory request capacity in the cluster to support node drains that result in live migrations. You can determine the approximate required spare memory by using the following calculation:
Product of (Maximum number of nodes that can drain in parallel) and (Highest total VM memory request allocations across nodes)
The default number of migrations that can run in parallel in the cluster is 5.
- If a VM uses a host model CPU, the nodes must support the CPU.
- Configuring a dedicated Multus network for live migration is highly recommended. A dedicated network minimizes the effects of network saturation on tenant workloads during migration.
10.1.2. VM migration tuning
You can adjust your cluster-wide live migration settings based on the type of workload and migration scenario. This enables you to control how many VMs migrate at the same time, the network bandwidth you want to use for each migration, and how long OpenShift Virtualization attempts to complete the migration before canceling the process. Configure these settings in the HyperConverged
custom resource (CR).
If you are migrating multiple VMs per node at the same time, set a bandwidthPerMigration
limit to prevent a large or busy VM from using a large portion of the node’s network bandwidth. By default, the bandwidthPerMigration
value is 0
, which means unlimited.
A large VM running a heavy workload (for example, database processing), with higher memory dirty rates, requires a higher bandwidth to complete the migration.
Post copy mode, when enabled, triggers if the initial pre-copy phase does not complete within the defined timeout. During post copy, the VM CPUs pause on the source host while transferring the minimum required memory pages. Then the VM CPUs activate on the destination host, and the remaining memory pages transfer into the destination node at runtime. This can impact performance during the transfer.
Post copy mode should not be used for critical data, or with unstable networks.
10.1.3. Common live migration tasks
You can perform the following live migration tasks:
- Configure live migration settings
- Configure live migration for heavy workloads
- Initiate and cancel live migration
- Monitor the progress of all live migrations in the Migration tab of the OpenShift Virtualization web console.
- View VM migration metrics in the Metrics tab of the web console.
10.1.4. Additional resources
10.2. Configuring live migration
You can configure live migration settings to ensure that the migration processes do not overwhelm the cluster.
You can configure live migration policies to apply different migration configurations to groups of virtual machines (VMs).
10.2.1. Configuring live migration limits and timeouts
Configure live migration limits and timeouts for the cluster by updating the HyperConverged
custom resource (CR), which is located in the openshift-cnv
namespace.
Procedure
Edit the
HyperConverged
CR and add the necessary live migration parameters:$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
Example configuration file
apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: liveMigrationConfig: bandwidthPerMigration: 64Mi 1 completionTimeoutPerGiB: 800 2 parallelMigrationsPerCluster: 5 3 parallelOutboundMigrationsPerNode: 2 4 progressTimeout: 150 5 allowPostCopy: false 6
- 1
- Bandwidth limit of each migration, where the value is the quantity of bytes per second. For example, a value of
2048Mi
means 2048 MiB/s. Default:0
, which is unlimited. - 2
- The migration is canceled if it has not completed in this time, in seconds per GiB of memory. For example, a VM with 6GiB memory times out if it has not completed migration in 4800 seconds. If the
Migration Method
isBlockMigration
, the size of the migrating disks is included in the calculation. - 3
- Number of migrations running in parallel in the cluster. Default:
5
. - 4
- Maximum number of outbound migrations per node. Default:
2
. - 5
- The migration is canceled if memory copy fails to make progress in this time, in seconds. Default:
150
. - 6
- If a VM is running a heavy workload and the memory dirty rate is too high, this can prevent the migration from one node to another from converging. To prevent this, you can enable post copy mode. By default,
allowPostCopy
is set tofalse
.
You can restore the default value for any spec.liveMigrationConfig
field by deleting that key/value pair and saving the file. For example, delete progressTimeout: <value>
to restore the default progressTimeout: 150
.
10.2.2. Configure live migration for heavy workloads
When migrating a VM running a heavy workload (for example, database processing) with higher memory dirty rates, you need a higher bandwidth to complete the migration.
If the dirty rate is too high, the migration from one node to another does not converge. To prevent this, enable post copy mode.
Post copy mode triggers if the initial pre-copy phase does not complete within the defined timeout. During post copy, the VM CPUs pause on the source host while transferring the minimum required memory pages. Then the VM CPUs activate on the destination host, and the remaining memory pages transfer into the destination node at runtime.
Configure live migration for heavy workloads by updating the HyperConverged
custom resource (CR), which is located in the openshift-cnv
namespace.
Procedure
Edit the
HyperConverged
CR and add the necessary parameters for migrating heavy workloads:$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
Example configuration file
apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged namespace: openshift-cnv spec: liveMigrationConfig: bandwidthPerMigration: 0Mi 1 completionTimeoutPerGiB: 150 2 parallelMigrationsPerCluster: 5 3 parallelOutboundMigrationsPerNode: 1 4 progressTimeout: 150 5 allowPostCopy: true 6
- 1
- Bandwidth limit of each migration, where the value is the quantity of bytes per second. The default is
0
, which is unlimited. - 2
- The migration is canceled if it is not completed in this time, and triggers post copy mode, when post copy is enabled. This value is measured in seconds per GiB of memory. You can lower
completionTimeoutPerGiB
to trigger post copy mode earlier in the migration process, or raise thecompletionTimeoutPerGiB
to trigger post copy mode later in the migration process. - 3
- Number of migrations running in parallel in the cluster. The default is
5
. Keeping theparallelMigrationsPerCluster
setting low is better when migrating heavy workloads. - 4
- Maximum number of outbound migrations per node. Configure a single VM per node for heavy workloads.
- 5
- The migration is canceled if memory copy fails to make progress in this time. This value is measured in seconds. Increase this parameter for large memory sizes running heavy workloads.
- 6
- Use post copy mode when memory dirty rates are high to ensure the migration converges. Set
allowPostCopy
totrue
to enable post copy mode.
- Optional: If your main network is too busy for the migration, configure a secondary, dedicated migration network.
Post copy mode can impact performance during the transfer, and should not be used for critical data, or with unstable networks.
10.2.3. Additional resources
10.2.4. Live migration policies
You can create live migration policies to apply different migration configurations to groups of VMs that are defined by VM or project labels.
You can create live migration policies by using the OpenShift Virtualization web console.
10.2.4.1. Creating a live migration policy by using the command line
You can create a live migration policy by using the command line. KubeVirt applies the live migration policy to selected virtual machines (VMs) by using any combination of labels:
-
VM labels such as
size
,os
, orgpu
-
Project labels such as
priority
,bandwidth
, orhpc-workload
For the policy to apply to a specific group of VMs, all labels on the group of VMs must match the labels of the policy.
If multiple live migration policies apply to a VM, the policy with the greatest number of matching labels takes precedence.
If multiple policies meet this criteria, the policies are sorted by alphabetical order of the matching label keys, and the first one in that order takes precedence.
Procedure
Edit the VM object to which you want to apply a live migration policy, and add the corresponding VM labels.
Open the YAML configuration of the resource:
$ oc edit vm <vm_name>
Adjust the required label values in the
.spec.template.metadata.labels
section of the configuration. For example, to mark the VM as aproduction
VM for the purposes of migration policies, add thekubevirt.io/environment: production
line:apiVersion: migrations.kubevirt.io/v1alpha1 kind: VirtualMachine metadata: name: <vm_name> namespace: default labels: app: my-app environment: production spec: template: metadata: labels: kubevirt.io/domain: <vm_name> kubevirt.io/size: large kubevirt.io/environment: production # ...
- Save and exit the configuration.
Configure a
MigrationPolicy
object with the corresponding labels. The following example configures a policy that applies to all VMs that are labeled asproduction
:apiVersion: migrations.kubevirt.io/v1alpha1 kind: MigrationPolicy metadata: name: <migration_policy> spec: selectors: namespaceSelector: 1 hpc-workloads: "True" xyz-workloads-type: "" virtualMachineInstanceSelector: 2 kubevirt.io/environment: "production"
Create the migration policy by running the following command:
$ oc create migrationpolicy -f <migration_policy>.yaml
10.2.5. Additional resources
10.3. Initiating and canceling live migration
You can initiate the live migration of a virtual machine (VM) to another node by using the OpenShift Container Platform web console or the command line.
You can cancel a live migration by using the web console or the command line. The VM remains on its original node.
You can also initiate and cancel live migration by using the virtctl migrate <vm_name>
and virtctl migrate-cancel <vm_name>
commands.
10.3.1. Initiating live migration
10.3.1.1. Initiating live migration by using the web console
You can live migrate a running virtual machine (VM) to a different node in the cluster by using the OpenShift Container Platform web console.
The Migrate action is visible to all users but only cluster administrators can initiate a live migration.
Prerequisites
- The VM must be migratable.
- If the VM is configured with a host model CPU, the cluster must have an available node that supports the CPU model.
Procedure
-
Navigate to Virtualization
VirtualMachines in the web console. - Select Migrate from the Options menu beside a VM.
- Click Migrate.
10.3.1.2. Initiating live migration by using the command line
You can initiate the live migration of a running virtual machine (VM) by using the command line to create a VirtualMachineInstanceMigration
object for the VM.
Procedure
Create a
VirtualMachineInstanceMigration
manifest for the VM that you want to migrate:apiVersion: kubevirt.io/v1 kind: VirtualMachineInstanceMigration metadata: name: <migration_name> spec: vmiName: <vm_name>
Create the object by running the following command:
$ oc create -f <migration_name>.yaml
The
VirtualMachineInstanceMigration
object triggers a live migration of the VM. This object exists in the cluster for as long as the virtual machine instance is running, unless manually deleted.
Verification
Obtain the VM status by running the following command:
$ oc describe vmi <vm_name> -n <namespace>
Example output
# ... Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Status: True Type: LiveMigratable Migration Method: LiveMigration Migration State: Completed: true End Timestamp: 2018-12-24T06:19:42Z Migration UID: d78c8962-0743-11e9-a540-fa163e0c69f1 Source Node: node2.example.com Start Timestamp: 2018-12-24T06:19:35Z Target Node: node1.example.com Target Node Address: 10.9.0.18:43891 Target Node Domain Detected: true
10.3.2. Canceling live migration
10.3.2.1. Canceling live migration by using the web console
You can cancel the live migration of a virtual machine (VM) by using the OpenShift Container Platform web console.
Procedure
-
Navigate to Virtualization
VirtualMachines in the web console. - Select Cancel Migration on the Options menu beside a VM.
10.3.2.2. Canceling live migration by using the command line
Cancel the live migration of a virtual machine by deleting the VirtualMachineInstanceMigration
object associated with the migration.
Procedure
Delete the
VirtualMachineInstanceMigration
object that triggered the live migration,migration-job
in this example:$ oc delete vmim migration-job