Chapter 10. Troubleshooting

This section describes resources for troubleshooting the Migration Toolkit for Containers (MTC).

For known issues, see the MTC release notes.

10.1. MTC workflow
Copy link

You can migrate Kubernetes resources, persistent volume data, and internal container images to OpenShift Container Platform 4.6 by using the Migration Toolkit for Containers (MTC) web console or the Kubernetes API.

MTC migrates the following resources:

A namespace specified in a migration plan.
Namespace-scoped resources: When the MTC migrates a namespace, it migrates all the objects and resources associated with that namespace, such as services or pods. Additionally, if a resource that exists in the namespace but not at the cluster level depends on a resource that exists at the cluster level, the MTC migrates both resources.
For example, a security context constraint (SCC) is a resource that exists at the cluster level and a service account (SA) is a resource that exists at the namespace level. If an SA exists in a namespace that the MTC migrates, the MTC automatically locates any SCCs that are linked to the SA and also migrates those SCCs. Similarly, the MTC migrates persistent volume claims that are linked to the persistent volumes of the namespace.
Note
Cluster-scoped resources might have to be migrated manually, depending on the resource.
Custom resources (CRs) and custom resource definitions (CRDs): MTC automatically migrates CRs and CRDs at the namespace level.

Migrating an application with the MTC web console involves the following steps:

Install the Migration Toolkit for Containers Operator on all clusters.
You can install the Migration Toolkit for Containers Operator in a restricted environment with limited or no internet access. The source and target clusters must have network access to each other and to a mirror registry.
Configure the replication repository, an intermediate object storage that MTC uses to migrate data.
The source and target clusters must have network access to the replication repository during migration. If you are using a proxy server, you must configure it to allow network traffic between the replication repository and the clusters.
Add the source cluster to the MTC web console.
Add the replication repository to the MTC web console.
Create a migration plan, with one of the following data migration options:
- Copy: MTC copies the data from the source cluster to the replication repository, and from the replication repository to the target cluster.
  Note
  If you are using direct image migration or direct volume migration, the images or volumes are copied directly from the source cluster to the target cluster.
  
  View larger image
- Move: MTC unmounts a remote volume, for example, NFS, from the source cluster, creates a PV resource on the target cluster pointing to the remote volume, and then mounts the remote volume on the target cluster. Applications running on the target cluster use the same remote volume that the source cluster was using. The remote volume must be accessible to the source and target clusters.
  Note
  Although the replication repository does not appear in this diagram, it is required for migration.
  
  View larger image
Run the migration plan, with one of the following options:
- Stage copies data to the target cluster without stopping the application.
  A stage migration can be run multiple times so that most of the data is copied to the target before migration. Running one or more stage migrations reduces the duration of the cutover migration.
- Cutover stops the application on the source cluster and moves the resources to the target cluster.
  Optional: You can clear the Halt transactions on the source cluster during migration checkbox.

About MTC custom resources

The Migration Toolkit for Containers (MTC) creates the following custom resources (CRs):

MigCluster (configuration, MTC cluster): Cluster definition

MigStorage (configuration, MTC cluster): Storage definition

MigPlan (configuration, MTC cluster): Migration plan

The MigPlan CR describes the source and target clusters, replication repository, and namespaces being migrated. It is associated with 0, 1, or many MigMigration CRs.

Note

Deleting a MigPlan CR deletes the associated MigMigration CRs.

BackupStorageLocation (configuration, MTC cluster): Location of Velero backup objects

VolumeSnapshotLocation (configuration, MTC cluster): Location of Velero volume snapshots

MigMigration (action, MTC cluster): Migration, created every time you stage or migrate data. Each MigMigration CR is associated with a MigPlan CR.

Backup (action, source cluster): When you run a migration plan, the MigMigration CR creates two Velero backup CRs on each source cluster:

Backup CR #1 for Kubernetes objects
Backup CR #2 for PV data

Restore (action, target cluster): When you run a migration plan, the MigMigration CR creates two Velero restore CRs on the target cluster:

Restore CR #1 (using Backup CR #2) for PV data
Restore CR #2 (using Backup CR #1) for Kubernetes objects

10.2. MTC custom resource manifests
Copy link

Migration Toolkit for Containers (MTC) uses the following custom resource (CR) manifests for migrating applications.

10.2.1. DirectImageMigration
Copy link

The DirectImageMigration CR copies images directly from the source cluster to the destination cluster.

apiVersion: migration.openshift.io/v1alpha1
kind: DirectImageMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_image_migration>
spec:
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  namespaces: 
    - <source_namespace_1>
    - <source_namespace_2>:<destination_namespace_3>

apiVersion: migration.openshift.io/v1alpha1
kind: DirectImageMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_image_migration>
spec:
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  namespaces:


    - <source_namespace_1>
    - <source_namespace_2>:<destination_namespace_3>

Copy to Clipboard

Toggle word wrap

1: One or more namespaces containing images to be migrated. By default, the destination namespace has the same name as the source namespace.
2: Source namespace mapped to a destination namespace with a different name.

10.2.2. DirectImageStreamMigration
Copy link

The DirectImageStreamMigration CR copies image stream references directly from the source cluster to the destination cluster.

apiVersion: migration.openshift.io/v1alpha1
kind: DirectImageStreamMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_image_stream_migration>
spec:
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  imageStreamRef:
    name: <image_stream>
    namespace: <source_image_stream_namespace>
  destNamespace: <destination_image_stream_namespace>

apiVersion: migration.openshift.io/v1alpha1
kind: DirectImageStreamMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_image_stream_migration>
spec:
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  imageStreamRef:
    name: <image_stream>
    namespace: <source_image_stream_namespace>
  destNamespace: <destination_image_stream_namespace>

Copy to Clipboard

Toggle word wrap

10.2.3. DirectVolumeMigration
Copy link

The DirectVolumeMigration CR copies persistent volumes (PVs) directly from the source cluster to the destination cluster.

apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigration
metadata:
  name: <direct_volume_migration>
  namespace: openshift-migration
spec:
  createDestinationNamespaces: false 
  deleteProgressReportingCRs: false 
  destMigClusterRef:
    name: <host_cluster> 
    namespace: openshift-migration
  persistentVolumeClaims:
  - name: <pvc> 
    namespace: <pvc_namespace>
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration

apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigration
metadata:
  name: <direct_volume_migration>
  namespace: openshift-migration
spec:
  createDestinationNamespaces: false


  deleteProgressReportingCRs: false


  destMigClusterRef:
    name: <host_cluster>


    namespace: openshift-migration
  persistentVolumeClaims:
  - name: <pvc>


    namespace: <pvc_namespace>
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration

Copy to Clipboard

Toggle word wrap

1: Set to true to create namespaces for the PVs on the destination cluster.
2: Set to true to delete DirectVolumeMigrationProgress CRs after migration. The default is false so that DirectVolumeMigrationProgress CRs are retained for troubleshooting.
3: Update the cluster name if the destination cluster is not the host cluster.
4: Specify one or more PVCs to be migrated.

10.2.4. DirectVolumeMigrationProgress
Copy link

The DirectVolumeMigrationProgress CR shows the progress of the DirectVolumeMigration CR.

apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigrationProgress
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_volume_migration_progress>
spec:
  clusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  podRef:
    name: <rsync_pod>
    namespace: openshift-migration

apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigrationProgress
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <direct_volume_migration_progress>
spec:
  clusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  podRef:
    name: <rsync_pod>
    namespace: openshift-migration

Copy to Clipboard

Toggle word wrap

10.2.5. MigAnalytic
Copy link

The MigAnalytic CR collects the number of images, Kubernetes resources, and the persistent volume (PV) capacity from an associated MigPlan CR.

You can configure the data that it collects.

apiVersion: migration.openshift.io/v1alpha1
kind: MigAnalytic
metadata:
  annotations:
    migplan: <migplan>
  name: <miganalytic>
  namespace: openshift-migration
  labels:
    migplan: <migplan>
spec:
  analyzeImageCount: true <.>
  analyzeK8SResources: true <.>
  analyzePVCapacity: true <.>
  listImages: false <.>
  listImagesLimit: 50 <.>
  migPlanRef:
    name: <migplan>
    namespace: openshift-migration

apiVersion: migration.openshift.io/v1alpha1
kind: MigAnalytic
metadata:
  annotations:
    migplan: <migplan>
  name: <miganalytic>
  namespace: openshift-migration
  labels:
    migplan: <migplan>
spec:
  analyzeImageCount: true <.>
  analyzeK8SResources: true <.>
  analyzePVCapacity: true <.>
  listImages: false <.>
  listImagesLimit: 50 <.>
  migPlanRef:
    name: <migplan>
    namespace: openshift-migration

Copy to Clipboard

Toggle word wrap

<.> Optional: Returns the number of images. <.> Optional: Returns the number, kind, and API version of the Kubernetes resources. <.> Optional: Returns the PV capacity. <.> Returns a list of image names. The default is false so that the output is not excessively long. <.> Optional: Specify the maximum number of image names to return if listImages is true.

10.2.6. MigCluster
Copy link

The MigCluster CR defines a host, local, or remote cluster.

apiVersion: migration.openshift.io/v1alpha1
kind: MigCluster
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <host_cluster> 
  namespace: openshift-migration
spec:
  isHostCluster: true 
# The 'azureResourceGroup' parameter is relevant only for Microsoft Azure.
  azureResourceGroup: <azure_resource_group> 
  caBundle: <ca_bundle_base64> 
  insecure: false 
  refresh: false 
# The 'restartRestic' parameter is relevant for a source cluster.
  restartRestic: true 
# The following parameters are relevant for a remote cluster.
  exposedRegistryPath: <registry_route> 
  url: <destination_cluster_url> 
  serviceAccountSecretRef:
    name: <source_secret> 
    namespace: openshift-config

apiVersion: migration.openshift.io/v1alpha1
kind: MigCluster
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <host_cluster>


  namespace: openshift-migration
spec:
  isHostCluster: true


# The 'azureResourceGroup' parameter is relevant only for Microsoft Azure.
  azureResourceGroup: <azure_resource_group>


  caBundle: <ca_bundle_base64>


  insecure: false


  refresh: false


# The 'restartRestic' parameter is relevant for a source cluster.
  restartRestic: true


# The following parameters are relevant for a remote cluster.
  exposedRegistryPath: <registry_route>


  url: <destination_cluster_url>


  serviceAccountSecretRef:
    name: <source_secret>


    namespace: openshift-config

Copy to Clipboard

Toggle word wrap

1: Update the cluster name if the migration-controller pod is not running on this cluster.
2: The migration-controller pod runs on this cluster if true.
3: Microsoft Azure only: Specify the resource group.
4: Optional: If you created a certificate bundle for self-signed CA certificates and if the insecure parameter value is false, specify the base64-encoded certificate bundle.
5: Set to true to disable SSL verification.
6: Set to true to validate the cluster.
7: Set to true to restart the Restic pods on the source cluster after the Stage pods are created.
8: Remote cluster and direct image migration only: Specify the exposed secure registry path.
9: Remote cluster only: Specify the URL.
10: Remote cluster only: Specify the name of the Secret CR.

10.2.7. MigHook
Copy link

The MigHook CR defines a migration hook that runs custom code at a specified stage of the migration. You can create up to four migration hooks. Each hook runs during a different phase of the migration.

You can configure the hook name, runtime duration, a custom image, and the cluster where the hook will run.

The migration phases and namespaces of the hooks are configured in the MigPlan CR.

apiVersion: migration.openshift.io/v1alpha1
kind: MigHook
metadata:
  generateName: <hook_name_prefix> 
  name: <mighook> 
  namespace: openshift-migration
spec:
  activeDeadlineSeconds: 1800 
  custom: false 
  image: <hook_image> 
  playbook: <ansible_playbook_base64> 
  targetCluster: source

apiVersion: migration.openshift.io/v1alpha1
kind: MigHook
metadata:
  generateName: <hook_name_prefix>


  name: <mighook>


  namespace: openshift-migration
spec:
  activeDeadlineSeconds: 1800


  custom: false


  image: <hook_image>


  playbook: <ansible_playbook_base64>


  targetCluster: source

Copy to Clipboard

Toggle word wrap

1: Optional: A unique hash is appended to the value for this parameter so that each migration hook has a unique name. You do not need to specify the value of the name parameter.
2: Specify the migration hook name, unless you specify the value of the generateName parameter.
3: Optional: Specify the maximum number of seconds that a hook can run. The default is 1800.
4: The hook is a custom image if true. The custom image can include Ansible or it can be written in a different programming language.
5: Specify the custom image, for example, quay.io/konveyor/hook-runner:latest. Required if custom is true.
6: Base64-encoded Ansible playbook. Required if custom is false.
7: Specify the cluster on which the hook will run. Valid values are source or destination.

10.2.8. MigMigration
Copy link

The MigMigration CR runs a MigPlan CR.

You can configure a Migmigration CR to run a stage or incremental migration, to cancel a migration in progress, or to roll back a completed migration.

apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migmigration>
  namespace: openshift-migration
spec:
  canceled: false 
  rollback: false 
  stage: false 
  quiescePods: true 
  keepAnnotations: true 
  verify: false 
  migPlanRef:
    name: <migplan>
    namespace: openshift-migration

apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migmigration>
  namespace: openshift-migration
spec:
  canceled: false


  rollback: false


  stage: false


  quiescePods: true


  keepAnnotations: true


  verify: false


  migPlanRef:
    name: <migplan>
    namespace: openshift-migration

Copy to Clipboard

Toggle word wrap

1: Set to true to cancel a migration in progress.
2: Set to true to roll back a completed migration.
3: Set to true to run a stage migration. Data is copied incrementally and the pods on the source cluster are not stopped.
4: Set to true to stop the application during migration. The pods on the source cluster are scaled to 0 after the Backup stage.
5: Set to true to retain the labels and annotations applied during the migration.
6: Set to true to check the status of the migrated pods on the destination cluster are checked and to return the names of pods that are not in a Running state.

10.2.9. MigPlan
Copy link

The MigPlan CR defines the parameters of a migration plan.

You can configure destination namespaces, hook phases, and direct or indirect migration.

Note

By default, a destination namespace has the same name as the source namespace. If you configure a different destination namespace, you must ensure that the namespaces are not duplicated on the source or the destination clusters because the UID and GID ranges are copied during migration.

apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migplan>
  namespace: openshift-migration
spec:
  closed: false 
  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  hooks: 
    - executionNamespace: <namespace> 
      phase: <migration_phase> 
      reference:
        name: <hook> 
        namespace: <hook_namespace> 
      serviceAccount: <service_account> 
  indirectImageMigration: true 
  indirectVolumeMigration: false 
  migStorageRef:
    name: <migstorage>
    namespace: openshift-migration
  namespaces:
    - <source_namespace_1> 
    - <source_namespace_2>
    - <source_namespace_3>:<destination_namespace_4> 
  refresh: false

apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migplan>
  namespace: openshift-migration
spec:
  closed: false


  srcMigClusterRef:
    name: <source_cluster>
    namespace: openshift-migration
  destMigClusterRef:
    name: <destination_cluster>
    namespace: openshift-migration
  hooks:


    - executionNamespace: <namespace>


      phase: <migration_phase>


      reference:
        name: <hook>


        namespace: <hook_namespace>


      serviceAccount: <service_account>


  indirectImageMigration: true


  indirectVolumeMigration: false


  migStorageRef:
    name: <migstorage>
    namespace: openshift-migration
  namespaces:
    - <source_namespace_1>


    - <source_namespace_2>
    - <source_namespace_3>:<destination_namespace_4>


  refresh: false

Copy to Clipboard

Toggle word wrap

1: The migration has completed if true. You cannot create another MigMigration CR for this MigPlan CR.
2: Optional: You can specify up to four migration hooks. Each hook must run during a different migration phase.
3: Optional: Specify the namespace in which the hook will run.
4: Optional: Specify the migration phase during which a hook runs. One hook can be assigned to one phase. Valid values are PreBackup, PostBackup, PreRestore, and PostRestore.
5: Optional: Specify the name of the MigHook CR.
6: Optional: Specify the namespace of MigHook CR.
7: Optional: Specify a service account with cluster-admin privileges.
8: Direct image migration is disabled if true. Images are copied from the source cluster to the replication repository and from the replication repository to the destination cluster.
9: Direct volume migration is disabled if true. PVs are copied from the source cluster to the replication repository and from the replication repository to the destination cluster.
10: Specify one or more source namespaces. If you specify only the source namespace, the destination namespace is the same.
11: Specify the destination namespace if it is different from the source namespace.
12: The MigPlan CR is validated if true.

10.2.10. MigStorage
Copy link

The MigStorage CR describes the object storage for the replication repository.

Amazon Web Services (AWS), Microsoft Azure, Google Cloud Storage, Multi-Cloud Object Gateway, and generic S3-compatible cloud storage are supported.

AWS and the snapshot copy method have additional parameters.

apiVersion: migration.openshift.io/v1alpha1
kind: MigStorage
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migstorage>
  namespace: openshift-migration
spec:
  backupStorageProvider: <backup_storage_provider> 
  volumeSnapshotProvider: <snapshot_storage_provider> 
  backupStorageConfig:
    awsBucketName: <bucket> 
    awsRegion: <region> 
    credsSecretRef:
      namespace: openshift-config
      name: <storage_secret> 
    awsKmsKeyId: <key_id> 
    awsPublicUrl: <public_url> 
    awsSignatureVersion: <signature_version> 
  volumeSnapshotConfig:
    awsRegion: <region> 
    credsSecretRef:
      namespace: openshift-config
      name: <storage_secret> 
  refresh: false

apiVersion: migration.openshift.io/v1alpha1
kind: MigStorage
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migstorage>
  namespace: openshift-migration
spec:
  backupStorageProvider: <backup_storage_provider>


  volumeSnapshotProvider: <snapshot_storage_provider>


  backupStorageConfig:
    awsBucketName: <bucket>


    awsRegion: <region>


    credsSecretRef:
      namespace: openshift-config
      name: <storage_secret>


    awsKmsKeyId: <key_id>


    awsPublicUrl: <public_url>


    awsSignatureVersion: <signature_version>


  volumeSnapshotConfig:
    awsRegion: <region>


    credsSecretRef:
      namespace: openshift-config
      name: <storage_secret>


  refresh: false

Copy to Clipboard

Toggle word wrap

1: Specify the storage provider.
2: Snapshot copy method only: Specify the storage provider.
3: AWS only: Specify the bucket name.
4: AWS only: Specify the bucket region, for example, us-east-1.
5: Specify the name of the Secret CR that you created for the storage.
6: AWS only: If you are using the AWS Key Management Service, specify the unique identifier of the key.
7: AWS only: If you granted public access to the AWS bucket, specify the bucket URL.
8: AWS only: Specify the AWS signature version for authenticating requests to the bucket, for example, 4.
9: Snapshot copy method only: Specify the geographical region of the clusters.
10: Snapshot copy method only: Specify the name of the Secret CR that you created for the storage.
11: Set to true to validate the cluster.

10.3. Logs and debugging tools
Copy link

This section describes logs and debugging tools that you can use for troubleshooting.

10.3.1. Viewing migration plan resources
Copy link

You can view migration plan resources to monitor a running migration or to troubleshoot a failed migration by using the MTC web console and the command line interface (CLI).

Procedure

In the MTC web console, click Migration Plans.
Click the Migrations number next to a migration plan to view the Migrations page.
Click a migration to view the Migration details.
Expand Migration resources to view the migration resources and their status in a tree view.
Note
To troubleshoot a failed migration, start with a high-level resource that has failed and then work down the resource tree towards the lower-level resources.
Click the Options menu next to a resource and select one of the following options:
- Copy oc describe command copies the command to your clipboard.
  - Log in to the relevant cluster and then run the command.
    The conditions and events of the resource are displayed in YAML format.
- Copy oc logs command copies the command to your clipboard.
  - Log in to the relevant cluster and then run the command.
    If the resource supports log filtering, a filtered log is displayed.
- View JSON displays the resource data in JSON format in a web browser.
  The data is the same as the output for the oc get <resource> command.

10.3.2. Viewing a migration plan log
Copy link

You can view an aggregated log for a migration plan. You use the MTC web console to copy a command to your clipboard and then run the command from the command line interface (CLI).

The command displays the filtered logs of the following pods:

Migration Controller
Velero
Restic
Rsync
Stunnel
Registry

Procedure

In the MTC web console, click Migration Plans.
Click the Migrations number next to a migration plan.
Click View logs.
Click the Copy icon to copy the oc logs command to your clipboard.
Log in to the relevant cluster and enter the command on the CLI.
The aggregated log for the migration plan is displayed.

10.3.3. Using the migration log reader
Copy link

You can use the migration log reader to display a single filtered view of all the migration logs.

Procedure

Get the mig-log-reader pod:

oc -n openshift-migration get pods | grep log

$ oc -n openshift-migration get pods | grep log

Copy to Clipboard

Toggle word wrap

Enter the following command to display a single migration log:
```
oc -n openshift-migration logs -f <mig-log-reader-pod> -c color
```
```
$ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color 
```
1
Copy to Clipboard Toggle word wrap
1
The -c plain option displays the log without colors.

10.3.4. Accessing performance metrics
Copy link

The MigrationController custom resource (CR) records metrics and pulls them into on-cluster monitoring storage. You can query the metrics by using Prometheus Query Language (PromQL) to diagnose migration performance issues. All metrics are reset when the Migration Controller pod restarts.

You can access the performance metrics and run queries by using the OpenShift Container Platform web console.

Procedure

In the OpenShift Container Platform web console, click Monitoring Metrics.
Enter a PromQL query, select a time window to display, and click Run Queries.
If your web browser does not display all the results, use the Prometheus console.

10.3.4.1. Provided metrics
Copy link

The MigrationController custom resource (CR) provides metrics for the MigMigration CR count and for its API requests.

10.3.4.1.1. cam_app_workload_migrations
Copy link

This metric is a count of MigMigration CRs over time. It is useful for viewing alongside the mtc_client_request_count and mtc_client_request_elapsed metrics to collate API request information with migration status changes. This metric is included in Telemetry.

Expand

Table 10.1. cam_app_workload_migrations metric
Queryable label name	Sample label values	Label description
status	`running`, `idle`, `failed`, `completed`	Status of the `MigMigration` CR
type	stage, final	Type of the `MigMigration` CR

10.3.4.1.2. mtc_client_request_count
Copy link

This metric is a cumulative count of Kubernetes API requests that MigrationController issued. It is not included in Telemetry.

Expand

Table 10.2. mtc_client_request_count metric
Queryable label name	Sample label values	Label description
cluster	`https://migcluster-url:443`	Cluster that the request was issued against
component	`MigPlan`, `MigCluster`	Sub-controller API that issued request
function	`(*ReconcileMigPlan).Reconcile`	Function that the request was issued from
kind	`SecretList`, `Deployment`	Kubernetes kind the request was issued for

10.3.4.1.3. mtc_client_request_elapsed
Copy link

This metric is a cumulative latency, in milliseconds, of Kubernetes API requests that MigrationController issued. It is not included in Telemetry.

Expand

Table 10.3. mtc_client_request_elapsed metric
Queryable label name	Sample label values	Label description
cluster	`https://cluster-url.com:443`	Cluster that the request was issued against
component	`migplan`, `migcluster`	Sub-controller API that issued request
function	`(*ReconcileMigPlan).Reconcile`	Function that the request was issued from
kind	`SecretList`, `Deployment`	Kubernetes resource that the request was issued for

10.3.4.1.4. Useful queries
Copy link

The table lists some helpful queries that can be used for monitoring performance.

Expand

Table 10.4. Useful queries
Query	Description
`mtc_client_request_count`	Number of API requests issued, sorted by request type
`sum(mtc_client_request_count)`	Total number of API requests issued
`mtc_client_request_elapsed`	API request latency, sorted by request type
`sum(mtc_client_request_elapsed)`	Total latency of API requests
`sum(mtc_client_request_elapsed) / sum(mtc_client_request_count)`	Average latency of API requests
`mtc_client_request_elapsed / mtc_client_request_count`	Average latency of API requests, sorted by request type
`cam_app_workload_migrations{status="running"} * 100`	Count of running migrations, multiplied by 100 for easier viewing alongside request counts

10.3.5. Using the must-gather tool
Copy link

You can collect logs, metrics, and information about MTC custom resources by using the must-gather tool.

The must-gather data must be attached to all customer cases.

You can collect data for a one-hour or a 24-hour period and view the data with the Prometheus console.

Prerequisites

You must be logged in to the OpenShift Container Platform cluster as a user with the cluster-admin role.
You must have the OpenShift CLI installed.

Procedure

Navigate to the directory where you want to store the must-gather data.
Run the oc adm must-gather command for one of the following data collection options:
- To collect data for the past hour:
  $ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.7
  Copy to Clipboard Toggle word wrap
  The data is saved as must-gather/must-gather.tar.gz. You can upload this file to a support case on the Red Hat Customer Portal.
- To collect data for the past 24 hours:
  $ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.7 \ -- /usr/bin/gather_metrics_dump
  Copy to Clipboard Toggle word wrap
  This operation can take a long time. The data is saved as must-gather/metrics/prom_data.tar.gz.

Viewing metrics data with the Prometheus console

You can view the metrics data with the Prometheus console.

Procedure

Decompress the prom_data.tar.gz file:

tar -xvzf must-gather/metrics/prom_data.tar.gz

$ tar -xvzf must-gather/metrics/prom_data.tar.gz

Copy to Clipboard

Toggle word wrap

Create a local Prometheus instance:
```
make prometheus-run
```
```
$ make prometheus-run
```
Copy to Clipboard Toggle word wrap
The command outputs the Prometheus URL.
Output
```
Started Prometheus on http://localhost:9090
```
```
Started Prometheus on http://localhost:9090
```
Copy to Clipboard Toggle word wrap
Launch a web browser and navigate to the URL to view the data by using the Prometheus web console.
After you have viewed the data, delete the Prometheus instance and data:
```
make prometheus-cleanup
```
```
$ make prometheus-cleanup
```
Copy to Clipboard Toggle word wrap

10.3.6. Debugging Velero resources with the Velero CLI tool
Copy link

You can debug Backup and Restore custom resources (CRs) and retrieve logs with the Velero CLI tool.

The Velero CLI tool provides more detailed information than the OpenShift CLI tool.

Syntax

Use the oc exec command to run a Velero CLI command:

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> <command> <cr_name>

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> <command> <cr_name>

Copy to Clipboard

Toggle word wrap

Example

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql

Copy to Clipboard

Toggle word wrap

Help option

Use the velero --help option to list all Velero CLI commands:

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  --help

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  --help

Copy to Clipboard

Toggle word wrap

Describe command

Use the velero describe command to retrieve a summary of warnings and errors associated with a Backup or Restore CR:

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> describe <cr_name>

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> describe <cr_name>

Copy to Clipboard

Toggle word wrap

Example

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql

Copy to Clipboard

Toggle word wrap

Logs command

Use the velero logs command to retrieve the logs of a Backup or Restore CR:

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> logs <cr_name>

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  <backup_restore_cr> logs <cr_name>

Copy to Clipboard

Toggle word wrap

Example

oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

$ oc -n openshift-migration exec deployment/velero -c velero -- ./velero \
  restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

Copy to Clipboard

Toggle word wrap

10.3.7. Debugging a partial migration failure
Copy link

You can debug a partial migration failure warning message by using the Velero CLI to examine the Restore custom resource (CR) logs.

A partial failure occurs when Velero encounters an issue that does not cause a migration to fail. For example, if a custom resource definition (CRD) is missing or if there is a discrepancy between CRD versions on the source and target clusters, the migration completes but the CR is not created on the target cluster.

Velero logs the issue as a partial failure and then processes the rest of the objects in the Backup CR.

Procedure

Check the status of a MigMigration CR:

oc get migmigration <migmigration> -o yaml

$ oc get migmigration <migmigration> -o yaml

Copy to Clipboard

Toggle word wrap

Example output

status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2021-01-26T20:48:40Z"
    message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster'
    status: "True"
    type: VeleroFinalRestorePartiallyFailed
  - category: Advisory
    durable: true
    lastTransitionTime: "2021-01-26T20:48:42Z"
    message: The migration has completed with warnings, please look at `Warn` conditions.
    reason: Completed
    status: "True"
    type: SucceededWithWarnings

status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2021-01-26T20:48:40Z"
    message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster'
    status: "True"
    type: VeleroFinalRestorePartiallyFailed
  - category: Advisory
    durable: true
    lastTransitionTime: "2021-01-26T20:48:42Z"
    message: The migration has completed with warnings, please look at `Warn` conditions.
    reason: Completed
    status: "True"
    type: SucceededWithWarnings

Copy to Clipboard

Toggle word wrap

Check the status of the Restore CR by using the Velero describe command:

$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
  restore describe <restore>

$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
  restore describe <restore>

Copy to Clipboard

Toggle word wrap

Example output

Phase:  PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information)

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    migration-example:  error restoring example.com/migration-example/migration-example: the server could not find the requested resource

Phase:  PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information)

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    migration-example:  error restoring example.com/migration-example/migration-example: the server could not find the requested resource

Copy to Clipboard

Toggle word wrap

Check the Restore CR logs by using the Velero logs command:

$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
  restore logs <restore>

$ oc -n {namespace} exec deployment/velero -c velero -- ./velero \
  restore logs <restore>

Copy to Clipboard

Toggle word wrap

Example output

time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

Copy to Clipboard

Toggle word wrap

The Restore CR log error message, the server could not find the requested resource, indicates the cause of the partially failed migration.

10.3.8. Using MTC custom resources for troubleshooting
Copy link

You can check the following Migration Toolkit for Containers (MTC) custom resources (CRs) to troubleshoot a failed migration:

MigCluster
MigStorage
MigPlan
BackupStorageLocation
The BackupStorageLocation CR contains a migrationcontroller label to identify the MTC instance that created the CR:
```
    labels:
      migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
```
```
    labels:
      migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
```
Copy to Clipboard Toggle word wrap
VolumeSnapshotLocation
The VolumeSnapshotLocation CR contains a migrationcontroller label to identify the MTC instance that created the CR:
```
    labels:
      migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
```
```
    labels:
      migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
```
Copy to Clipboard Toggle word wrap
MigMigration
Backup
MTC changes the reclaim policy of migrated persistent volumes (PVs) to Retain on the target cluster. The Backup CR contains an openshift.io/orig-reclaim-policy annotation that indicates the original reclaim policy. You can manually restore the reclaim policy of the migrated PVs.
Restore

Procedure

List the MigMigration CRs in the openshift-migration namespace:

oc get migmigration -n openshift-migration

$ oc get migmigration -n openshift-migration

Copy to Clipboard

Toggle word wrap

Example output

NAME                                   AGE
88435fe0-c9f8-11e9-85e6-5d593ce65e10   6m42s

NAME                                   AGE
88435fe0-c9f8-11e9-85e6-5d593ce65e10   6m42s

Copy to Clipboard

Toggle word wrap

Inspect the MigMigration CR:

oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration

$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration

Copy to Clipboard

Toggle word wrap

The output is similar to the following examples.

MigMigration example output

name:         88435fe0-c9f8-11e9-85e6-5d593ce65e10
namespace:    openshift-migration
labels:       <none>
annotations:  touch: 3b48b543-b53e-4e44-9d34-33563f0f8147
apiVersion:  migration.openshift.io/v1alpha1
kind:         MigMigration
metadata:
  creationTimestamp:  2019-08-29T01:01:29Z
  generation:          20
  resourceVersion:    88179
  selfLink:           /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10
  uid:                 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
spec:
  migPlanRef:
    name:        socks-shop-mig-plan
    namespace:   openshift-migration
  quiescePods:  true
  stage:         false
status:
  conditions:
    category:              Advisory
    durable:               True
    lastTransitionTime:  2019-08-29T01:03:40Z
    message:               The migration has completed successfully.
    reason:                Completed
    status:                True
    type:                  Succeeded
  phase:                   Completed
  startTimestamp:         2019-08-29T01:01:29Z
events:                    <none>

name:         88435fe0-c9f8-11e9-85e6-5d593ce65e10
namespace:    openshift-migration
labels:       <none>
annotations:  touch: 3b48b543-b53e-4e44-9d34-33563f0f8147
apiVersion:  migration.openshift.io/v1alpha1
kind:         MigMigration
metadata:
  creationTimestamp:  2019-08-29T01:01:29Z
  generation:          20
  resourceVersion:    88179
  selfLink:           /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10
  uid:                 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
spec:
  migPlanRef:
    name:        socks-shop-mig-plan
    namespace:   openshift-migration
  quiescePods:  true
  stage:         false
status:
  conditions:
    category:              Advisory
    durable:               True
    lastTransitionTime:  2019-08-29T01:03:40Z
    message:               The migration has completed successfully.
    reason:                Completed
    status:                True
    type:                  Succeeded
  phase:                   Completed
  startTimestamp:         2019-08-29T01:01:29Z
events:                    <none>

Copy to Clipboard

Toggle word wrap

Velero backup CR #2 example output that describes the PV data

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.105.179:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6
    openshift.io/orig-reclaim-policy: delete
  creationTimestamp: "2019-08-29T01:03:15Z"
  generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-
  generation: 1
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    velero.io/storage-location: myrepo-vpzq9
  name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  namespace: openshift-migration
  resourceVersion: "87313"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6
spec:
  excludedNamespaces: []
  excludedResources: []
  hooks:
    resources: []
  includeClusterResources: null
  includedNamespaces:
  - sock-shop
  includedResources:
  - persistentvolumes
  - persistentvolumeclaims
  - namespaces
  - imagestreams
  - imagestreamtags
  - secrets
  - configmaps
  - pods
  labelSelector:
    matchLabels:
      migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  storageLocation: myrepo-vpzq9
  ttl: 720h0m0s
  volumeSnapshotLocations:
  - myrepo-wv6fx
status:
  completionTimestamp: "2019-08-29T01:02:36Z"
  errors: 0
  expiration: "2019-09-28T01:02:35Z"
  phase: Completed
  startTimestamp: "2019-08-29T01:02:35Z"
  validationErrors: null
  version: 1
  volumeSnapshotsAttempted: 0
  volumeSnapshotsCompleted: 0
  warnings: 0

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.105.179:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6
    openshift.io/orig-reclaim-policy: delete
  creationTimestamp: "2019-08-29T01:03:15Z"
  generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-
  generation: 1
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
    velero.io/storage-location: myrepo-vpzq9
  name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  namespace: openshift-migration
  resourceVersion: "87313"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6
spec:
  excludedNamespaces: []
  excludedResources: []
  hooks:
    resources: []
  includeClusterResources: null
  includedNamespaces:
  - sock-shop
  includedResources:
  - persistentvolumes
  - persistentvolumeclaims
  - namespaces
  - imagestreams
  - imagestreamtags
  - secrets
  - configmaps
  - pods
  labelSelector:
    matchLabels:
      migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  storageLocation: myrepo-vpzq9
  ttl: 720h0m0s
  volumeSnapshotLocations:
  - myrepo-wv6fx
status:
  completionTimestamp: "2019-08-29T01:02:36Z"
  errors: 0
  expiration: "2019-09-28T01:02:35Z"
  phase: Completed
  startTimestamp: "2019-08-29T01:02:35Z"
  validationErrors: null
  version: 1
  volumeSnapshotsAttempted: 0
  volumeSnapshotsCompleted: 0
  warnings: 0

Copy to Clipboard

Toggle word wrap

Velero restore CR #2 example output that describes the Kubernetes resources

apiVersion: velero.io/v1
kind: Restore
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.90.187:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88
  creationTimestamp: "2019-08-28T00:09:49Z"
  generateName: e13a1b60-c927-11e9-9555-d129df7f3b96-
  generation: 3
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88
    migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88
  name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  namespace: openshift-migration
  resourceVersion: "82329"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  uid: 26983ec0-c928-11e9-825a-06fa9fb68c88
spec:
  backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f
  excludedNamespaces: null
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  includedNamespaces: null
  includedResources: null
  namespaceMapping: null
  restorePVs: true
status:
  errors: 0
  failureReason: ""
  phase: Completed
  validationErrors: null
  warnings: 15

apiVersion: velero.io/v1
kind: Restore
metadata:
  annotations:
    openshift.io/migrate-copy-phase: final
    openshift.io/migrate-quiesce-pods: "true"
    openshift.io/migration-registry: 172.30.90.187:5000
    openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88
  creationTimestamp: "2019-08-28T00:09:49Z"
  generateName: e13a1b60-c927-11e9-9555-d129df7f3b96-
  generation: 3
  labels:
    app.kubernetes.io/part-of: migration
    migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88
    migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88
  name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  namespace: openshift-migration
  resourceVersion: "82329"
  selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  uid: 26983ec0-c928-11e9-825a-06fa9fb68c88
spec:
  backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f
  excludedNamespaces: null
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io
  includedNamespaces: null
  includedResources: null
  namespaceMapping: null
  restorePVs: true
status:
  errors: 0
  failureReason: ""
  phase: Completed
  validationErrors: null
  warnings: 15

Copy to Clipboard

Toggle word wrap

10.4. Common issues and concerns
Copy link

This section describes common issues and concerns that can cause issues during migration.

10.4.1. Direct volume migration does not complete
Copy link

If direct volume migration does not complete, the target cluster might not have the same node-selector annotations as the source cluster.

Migration Toolkit for Containers (MTC) migrates namespaces with all annotations in order to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If a target cluster namespace does not have the same annotations as the source cluster namespace, the Rsync transfer pods cannot be scheduled. The Rsync pods remain in a Pending state.

You can identify and fix this issue by performing the following procedure.

Procedure

Check the status of the MigMigration CR:

oc describe migmigration <pod> -n openshift-migration

$ oc describe migmigration <pod> -n openshift-migration

Copy to Clipboard

Toggle word wrap

The output includes the following status message:

Example output

Some or all transfer pods are not running for more than 10 mins on destination cluster

Some or all transfer pods are not running for more than 10 mins on destination cluster

Copy to Clipboard

Toggle word wrap

On the source cluster, obtain the details of a migrated namespace:
```
oc get namespace <namespace> -o yaml
```
```
$ oc get namespace <namespace> -o yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Specify the migrated namespace.
On the target cluster, edit the migrated namespace:
```
oc edit namespace <namespace>
```
```
$ oc edit namespace <namespace>
```
Copy to Clipboard Toggle word wrap

Add the missing openshift.io/node-selector annotations to the migrated namespace as in the following example:

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/node-selector: "region=east"
...

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/node-selector: "region=east"
...

Copy to Clipboard

Toggle word wrap

Run the migration plan again.

10.4.2. Error messages and resolutions
Copy link

This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.

10.4.2.1. CA certificate error displayed when accessing the MTC console for the first time
Copy link

If the MTC console displays a CA certificate error message the first time you try to access it, the likely cause is that a cluster uses self-signed CA certificates.

Navigate to the oauth-authorization-server URL in the error message and accept the certificate. To resolve this issue permanently, install the certificate authority so that it is trusted.

If the browser displays an Unauthorized message after you have accepted the CA certificate, navigate to the MTC console and then refresh the web page.

10.4.2.2. OAuth timeout error in the MTC console
Copy link

If the MTC console displays a connection has timed out message after you have accepted a self-signed certificate, the cause is likely to be one of the following:

Interrupted network access to the OAuth server
Interrupted network access to the OpenShift Container Platform console
Proxy configuration blocking access to the OAuth server. See MTC console inaccessible because of OAuth timeout error for details.

To determine the cause:

Inspect the MTC console web page with a browser web inspector.
Check the Migration UI pod log for errors.

10.4.2.3. Certificate signed by unknown authority error
Copy link

If you use a self-signed certificate to secure a cluster or a replication repository for the Migration Toolkit for Containers (MTC), certificate verification might fail with the following error message: Certificate signed by unknown authority.

You can create a custom CA certificate bundle file and upload it in the MTC web console when you add a cluster or a replication repository.

Procedure

Download a CA certificate from a remote endpoint and save it as a CA bundle file:

echo -n | openssl s_client -connect <host_FQDN>:<port> \
  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert>

$ echo -n | openssl s_client -connect <host_FQDN>:<port> \


  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert>

Copy to Clipboard

Toggle word wrap

1: Specify the host FQDN and port of the endpoint, for example, api.my-cluster.example.com:6443.
2: Specify the name of the CA bundle file.

10.4.2.4. Backup storage location errors in the Velero pod log
Copy link

If a Velero Backup custom resource contains a reference to a backup storage location (BSL) that does not exist, the Velero pod log might display the following error messages:

Error checking repository for stale locks

Error getting backup storage location: backupstoragelocation.velero.io \"my-bsl\" not found

Error checking repository for stale locks

Error getting backup storage location: backupstoragelocation.velero.io \"my-bsl\" not found

Copy to Clipboard

Toggle word wrap

You can ignore these error messages. A missing BSL cannot cause a migration to fail.

10.4.2.5. Pod volume backup timeout error in the Velero pod log
Copy link

If a migration fails because Restic times out, the Velero pod log displays the following error:

level=error msg="Error backing up item" backup=velero/monitoring error="timed out
waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/
heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/
velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1

level=error msg="Error backing up item" backup=velero/monitoring error="timed out
waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/
heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/
velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1

Copy to Clipboard

Toggle word wrap

The default value of restic_timeout is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.

Procedure

In the OpenShift Container Platform web console, navigate to Operators Installed Operators.
Click Migration Toolkit for Containers Operator.
In the MigrationController tab, click migration-controller.
In the YAML tab, update the following parameter value:
```
spec:
  restic_timeout: 1h 
```
```
spec:
  restic_timeout: 1h 
```
1
Copy to Clipboard Toggle word wrap
1
Valid units are h (hours), m (minutes), and s (seconds), for example, 3h30m15s.
Click Save.

10.4.2.6. Restic verification errors in the MigMigration custom resource
Copy link

If data verification fails when migrating a persistent volume with the file system data copy method, the MigMigration CR displays the following error:

MigMigration CR status

status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: 2020-04-16T20:35:16Z
    message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
      for details 
    status: "True"
    type: ResticVerifyErrors

status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: 2020-04-16T20:35:16Z
    message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
      for details


    status: "True"
    type: ResticVerifyErrors

Copy to Clipboard

Toggle word wrap

1: The error message identifies the Restore CR name.
2: ResticVerifyErrors is a general error warning type that includes verification errors.

Note

A data verification error does not cause the migration process to fail.

You can check the Restore CR to troubleshoot the data verification error.

Procedure

View the Restore CR:

oc describe <registry-example-migration-rvwcm> -n openshift-migration

$ oc describe <registry-example-migration-rvwcm> -n openshift-migration

Copy to Clipboard

Toggle word wrap

The output identifies the persistent volume with PodVolumeRestore errors.

Example output

status:
  phase: Completed
  podVolumeRestoreErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration
  podVolumeRestoreResticErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration

status:
  phase: Completed
  podVolumeRestoreErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration
  podVolumeRestoreResticErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration

Copy to Clipboard

Toggle word wrap

View the PodVolumeRestore CR:

oc describe <migration-example-rvwcm-98t49>

$ oc describe <migration-example-rvwcm-98t49>

Copy to Clipboard

Toggle word wrap

The output identifies the Restic pod that logged the errors.

PodVolumeRestore CR with Restic pod error

  completionTimestamp: 2020-05-01T20:49:12Z
  errors: 1
  resticErrors: 1
  ...
  resticPod: <restic-nr2v5>

  completionTimestamp: 2020-05-01T20:49:12Z
  errors: 1
  resticErrors: 1
  ...
  resticPod: <restic-nr2v5>

Copy to Clipboard

Toggle word wrap

View the Restic pod log to locate the errors:
```
oc logs -f <restic-nr2v5>
```
```
$ oc logs -f <restic-nr2v5>
```
Copy to Clipboard Toggle word wrap

10.4.2.7. Restic permission error when migrating from NFS storage with root_squash enabled
Copy link

If you are migrating data from NFS storage and root_squash is enabled, Restic maps to nfsnobody and does not have permission to perform the migration. The Restic pod log displays the following error:

Restic permission error

backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec
/usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/
velero/pkg/controller/pod_volume_backup_controller.go:280" error.function=
"github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup"
logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id>
namespace=openshift-migration

backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec
/usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/
velero/pkg/controller/pod_volume_backup_controller.go:280" error.function=
"github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup"
logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id>
namespace=openshift-migration

Copy to Clipboard

Toggle word wrap

You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the MigrationController CR manifest.

Procedure

Create a supplemental group for Restic on the NFS storage.
Set the setgid bit on the NFS directories so that group ownership is inherited.
Add the restic_supplemental_groups parameter to the MigrationController CR manifest on the source and target clusters:
```
spec:
  restic_supplemental_groups: <group_id> 
```
```
spec:
  restic_supplemental_groups: <group_id> 
```
1
Copy to Clipboard Toggle word wrap
1
Specify the supplemental group ID.
Wait for the Restic pods to restart so that the changes are applied.

10.4.3. Known issues
Copy link

This release has the following known issues:

During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:
- openshift.io/sa.scc.mcs
- openshift.io/sa.scc.supplemental-groups
- openshift.io/sa.scc.uid-range
  These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)
Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.
If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)
If a large migration fails because Restic times out, you can increase the restic_timeout parameter value (default: 1h) in the MigrationController custom resource (CR) manifest.
If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.
If you are migrating data from NFS storage and root_squash is enabled, Restic maps to nfsnobody. The migration fails and a permission error is displayed in the Restic pod log. (BZ#1873641)
You can resolve this issue by adding supplemental groups for Restic to the MigrationController CR manifest:
```
spec:
...
  restic_supplemental_groups:
  - 5555
  - 6666
```
```
spec:
...
  restic_supplemental_groups:
  - 5555
  - 6666
```
Copy to Clipboard Toggle word wrap
If you perform direct volume migration with nodes that are in different availability zones, the migration might fail because the migrated pods cannot access the PVC. (BZ#1947487)

10.5. Rolling back a migration
Copy link

You can roll back a migration by using the MTC web console or the CLI.

You can also roll back a migration manually.

10.5.1. Rolling back a migration by using the MTC web console
Copy link

You can roll back a migration by using the Migration Toolkit for Containers (MTC) web console.

Note

The following resources remain in the migrated namespaces for debugging after a failed direct volume migration (DVM):

Config maps (source and destination clusters)
Secret CRs (source and destination clusters)
Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically.

If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.

Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.

Procedure

In the MTC web console, click Migration plans.
Click the Options menu beside a migration plan and select Rollback under Migration.
Click Rollback and wait for rollback to complete.
In the migration plan details, Rollback succeeded is displayed.
Verify that rollback was successful in the OpenShift Container Platform web console of the source cluster:
1. Click Home Projects.
2. Click the migrated project to view its status.
3. In the Routes section, click Location to verify that the application is functioning, if applicable.
4. Click Workloads Pods to verify that the pods are running in the migrated namespace.
5. Click Storage Persistent volumes to verify that the migrated persistent volume is correctly provisioned.

10.5.2. Rolling back a migration from the command line interface
Copy link

You can roll back a migration by creating a MigMigration custom resource (CR) from the command line interface.

Note

The following resources remain in the migrated namespaces for debugging after a failed direct volume migration (DVM):

Config maps (source and destination clusters)
Secret CRs (source and destination clusters)
Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically.

If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.

Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.

Procedure

Create a MigMigration CR based on the following example:

$ cat << EOF | oc apply -f -
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migmigration>
  namespace: openshift-migration
spec:
...
  rollback: true
...
  migPlanRef:
    name: <migplan> 
    namespace: openshift-migration
EOF

$ cat << EOF | oc apply -f -
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: <migmigration>
  namespace: openshift-migration
spec:
...
  rollback: true
...
  migPlanRef:
    name: <migplan>


    namespace: openshift-migration
EOF

Copy to Clipboard

Toggle word wrap

1: Specify the name of the associated MigPlan CR.

In the MTC web console, verify that the migrated project resources have been removed from the target cluster.
Verify that the migrated project resources are present in the source cluster and that the application is running.

10.5.3. Rolling back a migration manually
Copy link

You can roll back a failed migration manually by deleting the stage pods and unquiescing the application.

If you run the same migration plan successfully, the resources from the failed migration are deleted automatically.

Note

The following resources remain in the migrated namespaces after a failed direct volume migration (DVM):

Config maps (source and destination clusters)
Secret CRs (source and destination clusters)
Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

Procedure

Delete the stage pods on all clusters:

oc delete $(oc get pods -l migration.openshift.io/is-stage-pod -n <namespace>)

$ oc delete $(oc get pods -l migration.openshift.io/is-stage-pod -n <namespace>)

Copy to Clipboard

Toggle word wrap

1: Namespaces specified in the MigPlan CR.

Unquiesce the application on the source cluster by scaling the replicas to their premigration number:

oc scale deployment <deployment> --replicas=<premigration_replicas>

$ oc scale deployment <deployment> --replicas=<premigration_replicas>

Copy to Clipboard

Toggle word wrap

The migration.openshift.io/preQuiesceReplicas annotation in the Deployment CR displays the premigration number of replicas:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    migration.openshift.io/preQuiesceReplicas: "1"

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    migration.openshift.io/preQuiesceReplicas: "1"

Copy to Clipboard

Toggle word wrap

Verify that the application pods are running on the source cluster:
```
oc get pod -n <namespace>
```
```
$ oc get pod -n <namespace>
```
Copy to Clipboard Toggle word wrap

Additional resources

Deleting Operators from a cluster using the web console

Chapter 10. Troubleshooting

10.1. MTC workflowCopy linkLink copied to clipboard!

About MTC custom resources

10.2. MTC custom resource manifestsCopy linkLink copied to clipboard!

10.2.1. DirectImageMigrationCopy linkLink copied to clipboard!

10.2.2. DirectImageStreamMigrationCopy linkLink copied to clipboard!

10.2.3. DirectVolumeMigrationCopy linkLink copied to clipboard!

10.2.4. DirectVolumeMigrationProgressCopy linkLink copied to clipboard!

10.2.5. MigAnalyticCopy linkLink copied to clipboard!

10.2.6. MigClusterCopy linkLink copied to clipboard!

10.2.7. MigHookCopy linkLink copied to clipboard!

10.2.8. MigMigrationCopy linkLink copied to clipboard!

10.2.9. MigPlanCopy linkLink copied to clipboard!

10.2.10. MigStorageCopy linkLink copied to clipboard!

10.3. Logs and debugging toolsCopy linkLink copied to clipboard!

10.3.1. Viewing migration plan resourcesCopy linkLink copied to clipboard!

10.3.2. Viewing a migration plan logCopy linkLink copied to clipboard!

10.3.3. Using the migration log readerCopy linkLink copied to clipboard!

10.3.4. Accessing performance metricsCopy linkLink copied to clipboard!

10.3.4.1. Provided metricsCopy linkLink copied to clipboard!

10.3.4.1.1. cam_app_workload_migrationsCopy linkLink copied to clipboard!

10.3.4.1.2. mtc_client_request_countCopy linkLink copied to clipboard!

10.3.4.1.3. mtc_client_request_elapsedCopy linkLink copied to clipboard!

10.3.4.1.4. Useful queriesCopy linkLink copied to clipboard!

10.3.5. Using the must-gather toolCopy linkLink copied to clipboard!

Viewing metrics data with the Prometheus console

10.3.6. Debugging Velero resources with the Velero CLI toolCopy linkLink copied to clipboard!

Syntax

Help option

Describe command

Logs command

10.3.7. Debugging a partial migration failureCopy linkLink copied to clipboard!

10.3.8. Using MTC custom resources for troubleshootingCopy linkLink copied to clipboard!

10.4. Common issues and concernsCopy linkLink copied to clipboard!

10.4.1. Direct volume migration does not completeCopy linkLink copied to clipboard!

10.4.2. Error messages and resolutionsCopy linkLink copied to clipboard!

10.4.2.1. CA certificate error displayed when accessing the MTC console for the first timeCopy linkLink copied to clipboard!

10.4.2.2. OAuth timeout error in the MTC consoleCopy linkLink copied to clipboard!

10.4.2.3. Certificate signed by unknown authority errorCopy linkLink copied to clipboard!

10.4.2.4. Backup storage location errors in the Velero pod logCopy linkLink copied to clipboard!

10.4.2.5. Pod volume backup timeout error in the Velero pod logCopy linkLink copied to clipboard!

10.4.2.6. Restic verification errors in the MigMigration custom resourceCopy linkLink copied to clipboard!

10.4.2.7. Restic permission error when migrating from NFS storage with root_squash enabledCopy linkLink copied to clipboard!

10.4.3. Known issuesCopy linkLink copied to clipboard!

10.5. Rolling back a migrationCopy linkLink copied to clipboard!

10.5.1. Rolling back a migration by using the MTC web consoleCopy linkLink copied to clipboard!

10.5.2. Rolling back a migration from the command line interfaceCopy linkLink copied to clipboard!

10.5.3. Rolling back a migration manuallyCopy linkLink copied to clipboard!

Additional resources

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. MTC workflow
Copy link

10.2. MTC custom resource manifests
Copy link

10.2.1. DirectImageMigration
Copy link

10.2.2. DirectImageStreamMigration
Copy link

10.2.3. DirectVolumeMigration
Copy link

10.2.4. DirectVolumeMigrationProgress
Copy link

10.2.5. MigAnalytic
Copy link

10.2.6. MigCluster
Copy link

10.2.7. MigHook
Copy link

10.2.8. MigMigration
Copy link

10.2.9. MigPlan
Copy link

10.2.10. MigStorage
Copy link

10.3. Logs and debugging tools
Copy link

10.3.1. Viewing migration plan resources
Copy link

10.3.2. Viewing a migration plan log
Copy link

10.3.3. Using the migration log reader
Copy link

10.3.4. Accessing performance metrics
Copy link

10.3.4.1. Provided metrics
Copy link

10.3.4.1.1. cam_app_workload_migrations
Copy link

10.3.4.1.2. mtc_client_request_count
Copy link

10.3.4.1.3. mtc_client_request_elapsed
Copy link

10.3.4.1.4. Useful queries
Copy link

10.3.5. Using the must-gather tool
Copy link

10.3.6. Debugging Velero resources with the Velero CLI tool
Copy link

10.3.7. Debugging a partial migration failure
Copy link

10.3.8. Using MTC custom resources for troubleshooting
Copy link

10.4. Common issues and concerns
Copy link

10.4.1. Direct volume migration does not complete
Copy link

10.4.2. Error messages and resolutions
Copy link

10.4.2.1. CA certificate error displayed when accessing the MTC console for the first time
Copy link

10.4.2.2. OAuth timeout error in the MTC console
Copy link

10.4.2.3. Certificate signed by unknown authority error
Copy link

10.4.2.4. Backup storage location errors in the Velero pod log
Copy link

10.4.2.5. Pod volume backup timeout error in the Velero pod log
Copy link

10.4.2.6. Restic verification errors in the MigMigration custom resource
Copy link

10.4.2.7. Restic permission error when migrating from NFS storage with root_squash enabled
Copy link

10.4.3. Known issues
Copy link

10.5. Rolling back a migration
Copy link

10.5.1. Rolling back a migration by using the MTC web console
Copy link

10.5.2. Rolling back a migration from the command line interface
Copy link

10.5.3. Rolling back a migration manually
Copy link