Chapter 10. Troubleshooting common installation problems
If you are experiencing difficulties installing the Red Hat OpenShift AI Operator, read this section to understand what could be causing the problem and how to resolve it.
If the problem is not included here or in the release notes, contact Red Hat Support. When opening a support case, it is helpful to include debugging information about your cluster. You can collect this information by using the must-gather tool as described in Must-Gather for Red Hat OpenShift AI and Gathering data about your cluster.
You can also adjust the log level of OpenShift AI Operator components to increase or reduce log verbosity to suit your use case. For more information, see Configuring the OpenShift AI Operator logger.
10.1. The Red Hat OpenShift AI Operator cannot be retrieved from the image registry Copy linkLink copied to clipboard!
Problem
When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:
- The image registry is unavailable.
- There is a problem with your network connection.
- Your cluster is not operational and is therefore unable to retrieve the image registry.
Diagnosis
Check the logs in the Events section in OpenShift for further information about the Failure to pull from quay error message.
Resolution
- Contact Red Hat support.
10.2. OpenShift AI does not install on unsupported infrastructure Copy linkLink copied to clipboard!
Problem
You are deploying on an environment that is not documented as supported by the Red Hat OpenShift AI Operator.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Deploying on $infrastructure, which is not supported. Failing Installationerror message.
Resolution
- Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Supported Configurations for 3.x.
10.3. The creation of the OpenShift AI Custom Resource (CR) fails Copy linkLink copied to clipboard!
Problem
During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the ODH CR failed.error message.
Resolution
- Contact Red Hat support.
10.4. The creation of the OpenShift AI Notebooks Custom Resource (CR) fails Copy linkLink copied to clipboard!
Problem
During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RHODS Notebooks CR failed.error message.
Resolution
- Contact Red Hat support.
10.5. The OpenShift AI dashboard is not accessible Copy linkLink copied to clipboard!
Problem
After installing OpenShift AI, the redhat-ods-applications, redhat-ods-monitoring, and redhat-ods-operator project namespaces are Active but you cannot access the dashboard due to an error in one of the pods.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects.
Click Filter and select the checkbox for every status except Running and Completed.
The page displays the pods that have an error.
Resolution
- To see more information and troubleshooting steps for a pod, on the Pods page, click the link in the Status column for the pod.
- If the Status column does not display a link, click the pod name to open the pod details page and then click the Logs tab.
10.6. Reinstalling OpenShift AI fails with an error Copy linkLink copied to clipboard!
Problem
After uninstalling the OpenShift AI Operator and reinstalling it by using the CLI, the reinstallation fails with an unable to find DSCInitialization error in one of the OpenShift AI Operator pod logs. This issue can occur if the Auth custom resource from the previous installation was not deleted after uninstalling the OpenShift AI Operator and before reinstalling it. For more information, see Understanding the uninstallation process.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
Check the log for an error message similar to the following:
{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Resolution
- Uninstall the OpenShift AI Operator.
Delete the
Authcustom resource:- In the OpenShift web console, switch to the Administrator perspective.
- Click API Explorer.
-
From the All groups drop-down list, select or enter
services.platform.opendatahub.io. - Click the Auth kind.
- Click the Instances tab.
Click the action menu (⋮) and select Delete Auth.
The Delete Auth dialog appears.
- Click Delete.
- Install the OpenShift AI Operator again.
10.7. The dedicated-admins Role-based access control (RBAC) policy cannot be created Copy linkLink copied to clipboard!
Problem
The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed.error message.
Resolution
- Contact Red Hat support.
10.8. The ODH parameter secret does not get created Copy linkLink copied to clipboard!
Problem
An issue with the OpenShift AI Operator’s flow could result in failure to create the ODH parameter.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
-
Click Workloads
Pods. - Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Addon managed odh parameter secret does not exist.error message.
Resolution
- Contact Red Hat support.