This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.1.7. Troubleshooting
You can view the Migration Toolkit for Containers (MTC) custom resources and download logs to troubleshoot a failed migration.
If the application was stopped during the failed migration, you must roll it back manually in order to prevent data corruption.
Manual rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
1.7.1. Viewing migration Custom Resources 复制链接链接已复制到粘贴板!
The Migration Toolkit for Containers (MTC) creates the following custom resources (CRs):
MigCluster (configuration, MTC cluster): Cluster definition
MigStorage (configuration, MTC cluster): Storage definition
MigPlan (configuration, MTC cluster): Migration plan
The MigPlan
CR describes the source and target clusters, replication repository, and namespaces being migrated. It is associated with 0, 1, or many MigMigration
CRs.
Deleting a MigPlan
CR deletes the associated MigMigration
CRs.
BackupStorageLocation (configuration, MTC cluster): Location of
Velero
backup objects
VolumeSnapshotLocation (configuration, MTC cluster): Location of
Velero
volume snapshots
MigMigration (action, MTC cluster): Migration, created every time you stage or migrate data. Each
MigMigration
CR is associated with a MigPlan
CR.
Backup (action, source cluster): When you run a migration plan, the
MigMigration
CR creates two Velero
backup CRs on each source cluster:
- Backup CR #1 for Kubernetes objects
- Backup CR #2 for PV data
Restore (action, target cluster): When you run a migration plan, the
MigMigration
CR creates two Velero
restore CRs on the target cluster:
- Restore CR #1 (using Backup CR #2) for PV data
- Restore CR #2 (using Backup CR #1) for Kubernetes objects
Procedure
List the
MigMigration
CRs in theopenshift-migration
namespace:oc get migmigration -n openshift-migration
$ oc get migmigration -n openshift-migration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
NAME AGE 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inspect the
MigMigration
CR:oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
$ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output is similar to the following examples.
MigMigration
example output
Velero
backup CR #2 example output that describes the PV data
Velero
restore CR #2 example output that describes the Kubernetes resources
1.7.2. Using the migration log reader 复制链接链接已复制到粘贴板!
You can use the migration log reader to display a single filtered view of all the migration logs.
Procedure
Get the
mig-log-reader
pod:oc -n openshift-migration get pods | grep log
$ oc -n openshift-migration get pods | grep log
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enter the following command to display a single migration log:
oc -n openshift-migration logs -f <mig-log-reader-pod> -c color
$ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
-c plain
option displays the log without colors.
1.7.3. Downloading migration logs 复制链接链接已复制到粘贴板!
You can download the Velero
, Restic
, and MigrationController
pod logs in the Migration Toolkit for Containers (MTC) web console to troubleshoot a failed migration.
Procedure
- In the MTC console, click Migration plans to view the list of migration plans.
-
Click the Options menu
of a specific migration plan and select Logs.
Click Download Logs to download the logs of the
MigrationController
,Velero
, andRestic
pods for all clusters.You can download a single log by selecting the cluster, log source, and pod source, and then clicking Download Selected.
You can access a pod log from the CLI by using the
oc logs
command:oc logs <pod-name> -f -n openshift-migration
$ oc logs <pod-name> -f -n openshift-migration
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the pod name.
1.7.4. Updating deprecated APIs 复制链接链接已复制到粘贴板!
If your source cluster uses deprecated APIs, the following warning message is displayed when you create a migration plan in the Migration Toolkit for Containers (MTC) web console:
Some namespaces contain GVKs incompatible with destination cluster
Some namespaces contain GVKs incompatible with destination cluster
You can click See details to view the namespace and the incompatible APIs. This warning message does not block the migration.
During migration with the Migration Toolkit for Containers (MTC), the deprecated APIs are saved in the Velero
Backup #1 for Kubernetes objects. You can download the Velero
Backup, extract the deprecated API yaml
files, and update them with the oc convert
command. Then you can create the updated APIs on the target cluster.
Procedure
- Run the migration plan.
View the
MigPlan
custom resource (CR):oc describe migplan <migplan_name> -n openshift-migration
$ oc describe migplan <migplan_name> -n openshift-migration
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the name of the
MigPlan
CR.
The output is similar to the following:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the
MigMigration
name associated with theMigPlan
UID:oc get migmigration -o json | jq -r '.items[] | select(.metadata.ownerReferences[].uid=="<migplan_uid>") | .metadata.name'
$ oc get migmigration -o json | jq -r '.items[] | select(.metadata.ownerReferences[].uid=="<migplan_uid>") | .metadata.name'
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
MigPlan
CR UID.
Get the
MigMigration
UID associated with theMigMigration
name:oc get migmigration <migmigration_name> -o jsonpath='{.metadata.uid}'
$ oc get migmigration <migmigration_name> -o jsonpath='{.metadata.uid}'
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
MigMigration
name.
Get the
Velero
Backup name associated with theMigMigration
UID:oc get backup.velero.io --selector migration-initial-backup="<migmigration_uid>" -o jsonpath={.items[*].metadata.name}
$ oc get backup.velero.io --selector migration-initial-backup="<migmigration_uid>" -o jsonpath={.items[*].metadata.name}
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
MigMigration
UID.
Download the contents of the
Velero
Backup to your local machine by running the command for your storage provider:AWS S3:
aws s3 cp s3://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive
$ aws s3 cp s3://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the bucket, backup name, and your local backup directory name.
GCP:
gsutil cp gs://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive
$ gsutil cp gs://<bucket_name>/velero/backups/<backup_name> <backup_local_dir> --recursive
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the bucket, backup name, and your local backup directory name.
Azure:
azcopy copy 'https://velerobackups.blob.core.windows.net/velero/backups/<backup_name>' '<backup_local_dir>' --recursive
$ azcopy copy 'https://velerobackups.blob.core.windows.net/velero/backups/<backup_name>' '<backup_local_dir>' --recursive
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the backup name and your local backup directory name.
Extract the
Velero
Backup archive file:tar -xfv <backup_local_dir>/<backup_name>.tar.gz -C <backup_local_dir>
$ tar -xfv <backup_local_dir>/<backup_name>.tar.gz -C <backup_local_dir>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
oc convert
in offline mode on each deprecated API:oc convert -f <backup_local_dir>/resources/<gvk>.json
$ oc convert -f <backup_local_dir>/resources/<gvk>.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the converted API on the target cluster:
oc create -f <gvk>.json
$ oc create -f <gvk>.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.7.5. Error messages and resolutions 复制链接链接已复制到粘贴板!
This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.
1.7.5.1. Restic timeout error 复制链接链接已复制到粘贴板!
If a CA certificate error
message is displayed the first time you try to access the MTC console, the likely cause is the use of self-signed CA certificates in one of the clusters.
To resolve this issue, navigate to the oauth-authorization-server
URL displayed in the error message and accept the certificate. To resolve this issue permanently, add the certificate to the trust store of your web browser.
If an Unauthorized
message is displayed after you have accepted the certificate, navigate to the MTC console and refresh the web page.
1.7.5.2. OAuth timeout error in the MTC console 复制链接链接已复制到粘贴板!
If a connection has timed out
message is displayed in the MTC console after you have accepted a self-signed certificate, the causes are likely to be the following:
- Interrupted network access to the OAuth server
- Interrupted network access to the OpenShift Container Platform console
-
Proxy configuration that blocks access to the
oauth-authorization-server
URL. See MTC console inaccessible because of OAuth timeout error for details.
You can determine the cause of the timeout.
Procedure
- Navigate to the MTC console and inspect the elements with the browser web inspector.
Check the
MigrationUI
pod log:oc logs <MigrationUI_Pod> -n openshift-migration
$ oc logs <MigrationUI_Pod> -n openshift-migration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If a migration fails because Restic times out, the following error is displayed in the Velero
pod log.
Example output
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1
The default value of restic_timeout
is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
Installed Operators. - Click Migration Toolkit for Containers Operator.
- In the MigrationController tab, click migration-controller.
In the YAML tab, update the following parameter value:
spec: restic_timeout: 1h
spec: restic_timeout: 1h
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Valid units are
h
(hours),m
(minutes), ands
(seconds), for example,3h30m15s
.
- Click Save.
If data verification fails when migrating a persistent volume with the file system data copy method, the following error is displayed in the MigMigration
CR.
Example output
A data verification error does not cause the migration process to fail.
You can check the Restore
CR to identify the source of the data verification error.
Procedure
- Log in to the target cluster.
View the
Restore
CR:oc describe <registry-example-migration-rvwcm> -n openshift-migration
$ oc describe <registry-example-migration-rvwcm> -n openshift-migration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output identifies the persistent volume with
PodVolumeRestore
errors.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the
PodVolumeRestore
CR:oc describe <migration-example-rvwcm-98t49>
$ oc describe <migration-example-rvwcm-98t49>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output identifies the
Restic
pod that logged the errors.Example output
completionTimestamp: 2020-05-01T20:49:12Z errors: 1 resticErrors: 1 ... resticPod: <restic-nr2v5>
completionTimestamp: 2020-05-01T20:49:12Z errors: 1 resticErrors: 1 ... resticPod: <restic-nr2v5>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the
Restic
pod log to locate the errors:oc logs -f <restic-nr2v5>
$ oc logs -f <restic-nr2v5>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.7.6. Direct volume migration does not complete 复制链接链接已复制到粘贴板!
If direct volume migration does not complete, the target cluster might not have the same node-selector
annotations as the source cluster.
Migration Toolkit for Containers (MTC) migrates namespaces with all annotations in order to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If a target cluster namespace does not have the same annotations as the source cluster namespace, the Rsync transfer pods cannot be scheduled. The Rsync pods remain in a Pending
state.
You can identify and fix this issue by performing the following procedure.
Procedure
Check the status of the
MigMigration
CR:oc describe migmigration <pod_name> -n openshift-migration
$ oc describe migmigration <pod_name> -n openshift-migration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes the following status message:
Example output
... Some or all transfer pods are not running for more than 10 mins on destination cluster ...
... Some or all transfer pods are not running for more than 10 mins on destination cluster ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the source cluster, obtain the details of a migrated namespace:
oc get namespace <namespace> -o yaml
$ oc get namespace <namespace> -o yaml
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the migrated namespace.
On the target cluster, edit the migrated namespace:
oc edit namespace <namespace>
$ oc edit namespace <namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add missing
openshift.io/node-selector
annotations to the migrated namespace as in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Run the migration plan again.
You can debug the Backup
and Restore
custom resources (CRs) and partial migration failures with the Velero command line interface (CLI). The Velero CLI runs in the velero
pod.
1.7.7.1. Velero command syntax 复制链接链接已复制到粘贴板!
Velero CLI commands use the following syntax:
oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>
You can specify velero-<pod> -n openshift-migration
in place of $(oc get pods -n openshift-migration -o name | grep velero)
.
1.7.7.2. Help command 复制链接链接已复制到粘贴板!
The Velero help
command lists all the Velero CLI commands:
oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help
1.7.7.3. Describe command 复制链接链接已复制到粘贴板!
The Velero describe
command provides a summary of warnings and errors associated with a Velero resource:
oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> describe <resource_id>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> describe <resource_id>
Example
oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql
1.7.7.4. Logs command 复制链接链接已复制到粘贴板!
The Velero logs
command provides the logs associated with a Velero resource:
velero <resource> logs <resource_id>
velero <resource> logs <resource_id>
Example
oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
1.7.7.5. Debugging a partial migration failure 复制链接链接已复制到粘贴板!
You can debug a partial migration failure warning message by using the Velero CLI to examine the Restore
custom resource (CR) logs.
A partial failure occurs when Velero encounters an issue that does not cause a migration to fail. For example, if a custom resource definition (CRD) is missing or if there is a discrepancy between CRD versions on the source and target clusters, the migration completes but the CR is not created on the target cluster.
Velero logs the issue as a partial failure and then processes the rest of the objects in the Backup
CR.
Procedure
Check the status of a
MigMigration
CR:oc get migmigration <migmigration> -o yaml
$ oc get migmigration <migmigration> -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the
Restore
CR by using the Velerodescribe
command:$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe <restore>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe <restore>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
Restore
CR logs by using the Velerologs
command:$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs <restore>
$ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs <restore>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
Restore
CR log error message,the server could not find the requested resource
, indicates the cause of the partially failed migration.
1.7.8. Using must-gather to collect data 复制链接链接已复制到粘贴板!
You must run the must-gather
tool if you open a customer support case on the Red Hat Customer Portal for the Migration Toolkit for Containers (MTC).
The openshift-migration-must-gather-rhel8
image for MTC collects migration-specific logs and data that are not collected by the default must-gather
image.
Procedure
-
Navigate to the directory where you want to store the
must-gather
data. Run the
must-gather
command:oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.4
$ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Remove authentication keys and other sensitive information.
Create an archive file containing the contents of the
must-gather
data directory:tar cvaf must-gather.tar.gz must-gather.local.<uid>/
$ tar cvaf must-gather.tar.gz must-gather.local.<uid>/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Upload the compressed file as an attachment to your customer support case.
1.7.9. Rolling back a migration 复制链接链接已复制到粘贴板!
You can roll back a migration by using the MTC web console or the CLI.
1.7.9.1. Rolling back a migration in the MTC web console 复制链接链接已复制到粘贴板!
You can roll back a migration by using the Migration Toolkit for Containers (MTC) web console.
If your application was stopped during a failed migration, you must roll back the migration in order to prevent data corruption in the persistent volume.
Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
Procedure
- In the MTC web console, click Migration plans.
-
Click the Options menu
beside a migration plan and select Rollback.
Click Rollback and wait for rollback to complete.
In the migration plan details, Rollback succeeded is displayed.
Verify that rollback was successful in the OpenShift Container Platform web console of the source cluster:
-
Click Home
Projects. - Click the migrated project to view its status.
- In the Routes section, click Location to verify that the application is functioning, if applicable.
-
Click Workloads
Pods to verify that the pods are running in the migrated namespace. -
Click Storage
Persistent volumes to verify that the migrated persistent volume is correctly provisioned.
-
Click Home
1.7.9.1.1. Rolling back a migration from the CLI 复制链接链接已复制到粘贴板!
You can roll back a migration by creating a MigMigration
custom resource (CR) from the CLI.
If your application was stopped during a failed migration, you must roll back the migration in order to prevent data corruption in the persistent volume.
Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.
Procedure
Create a
MigMigration
CR based on the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the name of the associated
MigPlan
CR.
- In the MTC web console, verify that the migrated project resources have been removed from the target cluster.
- Verify that the migrated project resources are present in the source cluster and that the application is running.
1.7.10. Known issues 复制链接链接已复制到粘贴板!
This release has the following known issues:
During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:
-
openshift.io/sa.scc.mcs
-
openshift.io/sa.scc.supplemental-groups
openshift.io/sa.scc.uid-range
These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)
-
- Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.
- If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)
-
If a large migration fails because Restic times out, you can increase the
restic_timeout
parameter value (default:1h
) in theMigrationController
custom resource (CR) manifest. - If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.
If you are migrating data from NFS storage and
root_squash
is enabled,Restic
maps tonfsnobody
. The migration fails and a permission error is displayed in theRestic
pod log. (BZ#1873641)You can resolve this issue by adding supplemental groups for
Restic
to theMigrationController
CR manifest:spec: ... restic_supplemental_groups: - 5555 - 6666
spec: ... restic_supplemental_groups: - 5555 - 6666
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If you perform direct volume migration with nodes that are in different availability zones, the migration might fail because the migrated pods cannot access the PVC. (BZ#1947487)