此内容没有您所选择的语言版本。
Chapter 11. Troubleshooting common installation problems
If you are experiencing difficulties installing the Red Hat OpenShift AI Operator, read this section to understand what could be causing the problem and how to resolve it.
If the problem is not included here or in the release notes, contact Red Hat Support. When opening a support case, it is helpful to include debugging information about your cluster. You can collect this information by using the must-gather tool as described in Must-Gather for Red Hat OpenShift AI and Gathering data about your cluster.
You can also adjust the log level of OpenShift AI Operator components to increase or reduce log verbosity to suit your use case. For more information, see Configuring the OpenShift AI Operator logger.
Problem
When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:
- The image registry is unavailable.
- There is a problem with your network connection.
- Your cluster is not operational and is therefore unable to retrieve the image registry.
Diagnosis
Check the logs in the Events section in OpenShift for further information about the Failure to pull from quay error message.
Resolution
- Contact Red Hat support.
Problem
You are deploying on an environment that is not documented as supported by the Red Hat OpenShift AI Operator.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Deploying on $infrastructure, which is not supported. Failing Installationerror message.
Resolution
- Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Red Hat OpenShift AI: Supported Configurations.
Problem
During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the ODH CR failed.error message.
Resolution
- Contact Red Hat support.
Problem
During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RHODS Notebooks CR failed.error message.
Resolution
- Contact Red Hat support.
11.5. The OpenShift AI dashboard is not accessible 复制链接链接已复制到粘贴板!
Problem
After installing OpenShift AI, the redhat-ods-applications, redhat-ods-monitoring, and redhat-ods-operator project namespaces are Active but you cannot access the dashboard due to an error in one of the pods.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects.
Click Filter and select the checkbox for every status except Running and Completed.
The page displays the pods that have an error.
Resolution
- To see more information and troubleshooting steps for a pod, on the Pods page, click the link in the Status column for the pod.
- If the Status column does not display a link, click the pod name to open the pod details page and then click the Logs tab.
11.6. Reinstalling OpenShift AI fails with an error 复制链接链接已复制到粘贴板!
Problem
After uninstalling the OpenShift AI Operator and reinstalling it by using the CLI, the reinstallation fails with an unable to find DSCInitialization error in one of the OpenShift AI Operator pod logs. This issue can occur if the Auth custom resource from the previous installation was not deleted after uninstalling the OpenShift AI Operator and before reinstalling it. For more information, see Understanding the uninstallation process.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
Check the log for an error message similar to the following:
{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Resolution
- Uninstall the OpenShift AI Operator.
Delete the
Authcustom resource:- In the OpenShift web console, switch to the Administrator perspective.
- Click API Explorer.
-
From the All groups drop-down list, select or enter
services.platform.opendatahub.io. - Click the Auth kind.
- Click the Instances tab.
Click the action menu (⋮) and select Delete Auth.
The Delete Auth dialog appears.
- Click Delete.
- Install the OpenShift AI Operator again.
Problem
The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed.error message.
Resolution
- Contact Red Hat support.
11.8. The PagerDuty secret does not get created 复制链接链接已复制到粘贴板!
Problem
An issue with Managed Tenants SRE automation process causes the PagerDuty’s secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Pagerduty secret does not existerror message.
Resolution
- Contact Red Hat support.
11.9. The SMTP secret does not exist 复制链接链接已复制到粘贴板!
Problem
An issue with Managed Tenants SRE automation process causes the SMTP secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: SMTP secret does not existerror message.
Resolution
- Contact Red Hat support.
11.10. The ODH parameter secret does not get created 复制链接链接已复制到粘贴板!
Problem
An issue with the OpenShift AI Operator’s flow could result in failure to create the ODH parameter.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Addon managed odh parameter secret does not exist.error message.
Resolution
- Contact Red Hat support.
Problem
After installing OpenShift AI 2.9 or later with an Argo Workflows installation that is not installed by OpenShift AI on your cluster, data science pipelines are not enabled despite the datasciencepipelines component being enabled in the DataScienceCluster object.
Diagnosis
After you install OpenShift AI 2.9 or later, the Data science pipelines tab is not visible on the OpenShift AI dashboard navigation menu.
Resolution
- Delete the separate instance of Argo workflows from your cluster. After you have removed any Argo Workflows resources that are not created by OpenShift AI from your cluster, data science pipelines are enabled automatically.