Chapter 10. Troubleshooting common installation problems
If you are experiencing difficulties installing the Red Hat OpenShift AI Add-on, read this section to understand what could be causing the problem and how to resolve it.
If the problem is not included here or in the release notes, contact Red Hat Support. When opening a support case, it is helpful to include debugging information about your cluster. You can collect this information by using the must-gather
tool as described in Must-Gather for Red Hat OpenShift AI and Gathering data about your cluster.
You can also adjust the log level of OpenShift AI Operator components to increase or reduce log verbosity to suit your use case. For more information, see Configuring the OpenShift AI Operator logger.
10.1. The Red Hat OpenShift AI Operator cannot be retrieved from the image registry
Problem
When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay
error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:
- The image registry is unavailable.
- There is a problem with your network connection.
- Your cluster is not operational and is therefore unable to retrieve the image registry.
Diagnosis
Check the logs in the Events section in OpenShift for further information about the Failure to pull from quay
error message.
Resolution
- Contact Red Hat support.
10.2. OpenShift AI cannot be installed due to insufficient cluster resources
Problem
When attempting to install OpenShift AI, an error message appears stating that installation prerequisites have not been met.
Diagnosis
- Log in to Red Hat OpenShift Cluster Manager (https://console.redhat.com/openshift/).
Click Clusters.
The Clusters page opens.
Click the name of the cluster you want to install OpenShift AI on.
The Details page for the cluster opens.
- Click the Add-ons tab and locate the Red Hat OpenShift AI tile.
- Click Install. The Configure Red Hat OpenShift AI pane appears.
- If the installation fails, click the Prerequisites tab.
- Note down the error message. If the error message states that you require a new machine pool, or that more resources are required, take the appropriate action to resolve the problem.
Resolution
- You might need to add more resources to your cluster, or increase the size of your machine pool. To increase your cluster’s resources, contact your infrastructure administrator. For more information about increasing the size of your machine pool, see Nodes and Allocating additional resources to OpenShift AI users.
10.3. OpenShift AI does not install on unsupported infrastructure
Problem
You are deploying on an environment that is not documented as supported by the Red Hat OpenShift AI Operator.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Deploying on $infrastructure, which is not supported. Failing Installation
error message.
Resolution
- Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Red Hat OpenShift AI: Supported Configurations.
10.4. The creation of the OpenShift AI Custom Resource (CR) fails
Problem
During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the ODH CR failed.
error message.
Resolution
- Contact Red Hat support.
10.5. The creation of the OpenShift AI Notebooks Custom Resource (CR) fails
Problem
During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RHODS Notebooks CR failed.
error message.
Resolution
- Contact Red Hat support.
10.6. The OpenShift AI dashboard is not accessible
Problem
After installing OpenShift AI, the redhat-ods-applications
, redhat-ods-monitoring
, and redhat-ods-operator
project namespaces are Active
but you cannot access the dashboard due to an error in the pod.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects.
Click Filter and select the checkbox for every status except Running and Completed.
The page displays the pods that have an error.
Resolution
- To see more information and troubleshooting steps for a pod, on the Pods page, click the link in the Status column for the pod.
- If the Status column does not display a link, click the pod name to open the pod details page and then click the Logs tab.
10.7. The dedicated-admins Role-based access control (RBAC) policy cannot be created
Problem
The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed.
error message.
Resolution
- Contact Red Hat support.
10.8. The Dead Man’s Snitch operator’s secret does not get created
Problem
An issue with Managed Tenants SRE automation process causes the Dead Man’s Snitch operator’s secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Dead Man Snitch secret does not exist.
error message.
Resolution
- Contact Red Hat support.
10.9. The PagerDuty secret does not get created
Problem
An issue with Managed Tenants SRE automation process causes the PagerDuty’s secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Pagerduty secret does not exist
error message.
Resolution
- Contact Red Hat support.
10.10. The SMTP secret does not exist
Problem
An issue with Managed Tenants SRE automation process causes the SMTP secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: SMTP secret does not exist
error message.
Resolution
- Contact Red Hat support.
10.11. The ODH parameter secret does not get created
Problem
An issue with the OpenShift AI Add-on’s flow could result in failure to create the ODH parameter.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Addon managed odh parameter secret does not exist.
error message.
Resolution
- Contact Red Hat support.
10.12. Data science pipelines are not enabled after installing OpenShift AI 2.9 or later due to existing Argo Workflows resources
Problem
After installing OpenShift AI 2.9 or later with an Argo Workflows installation that is not installed by OpenShift AI on your cluster, data science pipelines are not enabled despite the datasciencepipelines
component being enabled in the DataScienceCluster
object.
Diagnosis
After you install OpenShift AI 2.9 or later, the Data Science Pipelines tab is not visible on the OpenShift AI dashboard navigation menu.
Resolution
- Delete the separate installation of Argo workflows on your cluster. After you have removed any Argo Workflows resources that are not created by OpenShift AI from your cluster, data science pipelines are enabled automatically.