Installing and uninstalling OpenShift AI Self-Managed
Install and uninstall OpenShift AI Self-Managed
Abstract
Preface Copy linkLink copied to clipboard!
Learn how to use both the OpenShift CLI (oc) and web console to install Red Hat OpenShift AI Self-Managed on your OpenShift cluster. To uninstall the product, learn how to use the recommended command-line interface (CLI) method.
You cannot upgrade from OpenShift AI 2.25 or any earlier version to 3.0. OpenShift AI 3.0 introduces significant technology and component changes and is intended for new installations only. To use OpenShift AI 3.0, install the Red Hat OpenShift AI Operator on a cluster running OpenShift Container Platform 4.19 or later and select the fast-3.x channel.
Support for upgrades will be available in a later release, including upgrades from OpenShift AI 2.25 to a stable 3.x version.
For more information, see the Why upgrades to OpenShift AI 3.0 are not supported Knowledgebase article.
Red Hat does not support installing more than one instance of OpenShift AI on your cluster.
Red Hat does not support installing the Red Hat OpenShift AI Operator on the same cluster as the Red Hat OpenShift AI Add-on.
Chapter 1. Architecture of OpenShift AI Self-Managed Copy linkLink copied to clipboard!
Red Hat OpenShift AI Self-Managed is an Operator that is available in a self-managed environment, such as Red Hat OpenShift Container Platform, or in Red Hat-managed cloud environments such as Red Hat OpenShift Dedicated (with a Customer Cloud Subscription for AWS or GCP), Red Hat OpenShift Service on Amazon Web Services (ROSA classic or ROSA HCP), or Microsoft Azure Red Hat OpenShift.
OpenShift AI integrates the following components and services:
At the service layer:
- OpenShift AI dashboard
- A customer-facing dashboard that shows available and installed applications for the OpenShift AI environment as well as learning resources such as tutorials, quick starts, and documentation. Administrative users can access functionality to manage users, clusters, workbench images, and model-serving runtimes. Data scientists can use the dashboard to create projects to organize their data science work.
- Model serving
- Data scientists can deploy trained machine-learning models to serve intelligent applications in production. After deployment, applications can send requests to the model using its deployed API endpoint.
- AI pipelines
- Data scientists can build portable machine learning (ML) workflows with AI pipelines by using Docker containers. With AI pipelines, data scientists can automate workflows as they develop their data science models.
- Jupyter (self-managed)
- A self-managed application that allows data scientists to configure a basic standalone workbench and develop machine learning models in JupyterLab.
- Distributed workloads
- Data scientists can use multiple nodes in parallel to train machine-learning models or process data more quickly. This approach significantly reduces the task completion time, and enables the use of larger datasets and more complex models.
- Retrieval-Augmented Generation (RAG)
- Data scientists and AI engineers can leverage Retrieval-Augmented Generation (RAG) capabilities provided by the integrated Llama Stack Operator. By combining large language model inference, semantic retrieval, and vector database storage, data scientists and AI engineers can obtain tailored, accurate, and verifiable answers to complex queries based on their own datasets within a project.
At the management layer:
- The Red Hat OpenShift AI Operator
- A meta-operator that deploys and maintains all components and sub-operators that are part of OpenShift AI.
When you install the Red Hat OpenShift AI Operator in the OpenShift cluster using the predefined projects, the following new projects are created:
-
The
redhat-ods-operatorproject contains the Red Hat OpenShift AI Operator. -
The
redhat-ods-applicationsproject includes the dashboard and other required components of OpenShift AI. -
The
rhods-notebooksproject is where basic workbenches are deployed by default.
You can specify custom projects if needed. You or your data scientists must also create additional projects for the applications that will use your machine learning models.
Do not install independent software vendor (ISV) applications in namespaces associated with OpenShift AI.
Chapter 2. Understanding update channels Copy linkLink copied to clipboard!
You can use update channels to specify which Red Hat OpenShift AI minor version you intend to update your Operator to. Update channels also allow you to choose the timing and level of support your updates have through the fast, stable, stable-x.y eus-x.y, and alpha channel options.
The subscription of an installed Operator specifies the update channel, which is used to track and receive updates for the Operator. You can change the update channel to start tracking and receiving updates from a newer channel. For more information about the release frequency and the lifecycle associated with each of the available update channels, see the Red Hat OpenShift AI Self-Managed Life Cycle Knowledgebase article.
| Channel | Support | Release frequency | Recommended environment |
|---|---|---|---|
|
| One month of full support | Every month |
Production environments with access to the latest product features.
Select this streaming channel with automatic updates to avoid manually upgrading every month.
NOTE: OpenShift AI 3.0 is available only through the |
|
| Three months of full support | Every three months |
Production environments with stability prioritized over new feature availability. Select this streaming channel with automatic updates to access the latest stable release and avoid manually upgrading. |
|
| Seven months of full support | Every three months |
Production environments with stability prioritized over new feature availability.
Select numbered stable channels (such as |
|
| Seven months of full support followed by Extended Update Support for eleven months | Every nine months |
Enterprise-grade environments that cannot upgrade within a seven month window. Select this streaming channel if you prioritize stability over new feature availability. |
|
| One month of full support | Every month |
Development environments with early-access features that might not be functionally complete.
Select this channel to use early-access features to test functionality and provide feedback during the development process. Early-access features are not supported with Red Hat production service level agreements (SLAs). For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. For more information about the support scope of Red Hat Developer Preview features, see Developer Preview Features Support Scope. |
The embedded and beta channels are legacy channels that will be removed in a future release. Do not select the embedded or beta channels for a new Operator installation.
Chapter 3. Installing and deploying OpenShift AI Copy linkLink copied to clipboard!
You cannot upgrade from OpenShift AI 2.25 or any earlier version to 3.0. OpenShift AI 3.0 introduces significant technology and component changes and is intended for new installations only. To use OpenShift AI 3.0, install the Red Hat OpenShift AI Operator on a cluster running OpenShift Container Platform 4.19 or later and select the fast-3.x channel.
Support for upgrades will be available in a later release, including upgrades from OpenShift AI 2.25 to a stable 3.x version.
For more information, see the Why upgrades to OpenShift AI 3.0 are not supported Knowledgebase article.
Red Hat OpenShift AI is a platform for data scientists and developers of artificial intelligence (AI) applications. It provides a fully supported environment that lets you rapidly develop, train, test, and deploy machine learning models on-premises and/or in the public cloud.
OpenShift AI is provided as a managed cloud service add-on for Red Hat OpenShift or as self-managed software that you can install on-premise or in the public cloud on OpenShift.
For information about installing OpenShift AI as self-managed software on your OpenShift cluster in a disconnected environment, see Installing and uninstalling OpenShift AI Self-Managed in a disconnected environment. For information about installing OpenShift AI as a managed cloud service add-on, see Installing and uninstalling OpenShift AI Cloud Service.
Installing OpenShift AI involves the following high-level tasks:
- Confirm that your OpenShift cluster meets all requirements. See Requirements for OpenShift AI Self-Managed.
- Install the Red Hat OpenShift AI Operator. See Installing the Red Hat OpenShift AI Operator.
- Install OpenShift AI components. See Installing and managing Red Hat OpenShift AI components.
- Complete any additional configuration required for the components you enabled. See the component-specific configuration sections for details.
- Configure user and administrator groups to provide user access to OpenShift AI. See Adding users to OpenShift AI user groups.
- Access the OpenShift AI dashboard. See Accessing the OpenShift AI dashboard.
3.1. Requirements for OpenShift AI Self-Managed Copy linkLink copied to clipboard!
You must meet the following requirements before you can install Red Hat OpenShift AI on your Red Hat OpenShift cluster.
3.1.1. Platform requirements Copy linkLink copied to clipboard!
Subscriptions
- A subscription for Red Hat OpenShift AI Self-Managed is required.
If you want to install OpenShift AI Self-Managed in a Red Hat-managed cloud environment, you must also have a subscription for one of the following platforms:
- Red Hat OpenShift Dedicated on Amazon Web Services (AWS) or Google Cloud Platform (GCP)
- Red Hat OpenShift Service on Amazon Web Services (ROSA classic)
- Red Hat OpenShift Service on Amazon Web Services with hosted control planes (ROSA HCP)
- Microsoft Azure Red Hat OpenShift
Red Hat OpenShift Kubernetes Engine (OKE)
NoteWhile OpenShift Kubernetes Engine (OKE) typically restricts the installation of certain post-installation Operators, Red Hat provides a specific licensing exception for Red Hat OpenShift AI users. This exception exclusively applies to Operators used to support Red Hat OpenShift AI workloads. Installing or using these Operators for purposes unrelated to OpenShift AI is a violation of the OKE service agreement.
Contact your Red Hat account manager to purchase new subscriptions. If you do not yet have an account manager, complete the form at https://www.redhat.com/en/contact to request one.
Cluster administrator access
- Cluster administrator access is required to install OpenShift AI.
- You can use an existing cluster or create a new one that meets the supported version requirements.
Supported OpenShift versions
The following OpenShift versions are supported for installing OpenShift AI:
OpenShift Container Platform 4.19 to 4.20. See OpenShift Container Platform installation overview.
- To deploy models by using Distributed Inference with llm-d, your cluster must be running version 4.20 or later.
- OpenShift Dedicated 4. See Creating an OpenShift Dedicated cluster.
- ROSA classic 4. See Install ROSA classic clusters.
- ROSA HCP 4. See Install ROSA with HCP clusters.
OpenShift Kubernetes Engine (OKE). See About OpenShift Kubernetes Engine.
NoteWhile OpenShift Kubernetes Engine (OKE) typically restricts the installation of certain post-installation Operators, Red Hat provides a specific licensing exception for Red Hat OpenShift AI users. This exception exclusively applies to Operators used to support Red Hat OpenShift AI workloads. Installing or using these Operators for purposes unrelated to OpenShift AI is a violation of the OKE service agreement.
The following Operators are required dependencies for Red Hat OpenShift AI 2.x and 3.x. These Operators are not supported on OKE, but can be installed if given an exception.
Expand Red Hat OpenShift AI version Operator (Unsupported, Exception Required) 2.x
Authorino Operator, Service Mesh Operator, Serverless Operator
3.x
Job-set-operator, openshift-custom-metrics-autoscaler-operator, cert-manager Operator, Leader Worker Set Operator, Red Hat Connectivity Link Operator, Kueue Operator (RHBOK), SR-IOV Operator, GPU Operator (with custom configurations), OpenTelemetry, Tempo, Cluster Observability Operator.
In OpenStack, CodeReady Containers (CRC), and other private cloud environments without integrated external DNS, you must manually configure DNS A or CNAME records after installing the Operator and components, when the LoadBalancer IP becomes available. For more information, see Configuring External DNS for RHOAI 3.x on OpenStack and Private Clouds.
Cluster configuration
- A minimum of 2 worker nodes with at least 8 CPUs and 32 GiB RAM each is required to install the Operator.
- For single-node OpenShift clusters, the node must have at least 32 CPUs and 128 GiB RAM.
- Additional resources are required depending on your workloads.
- Open Data Hub must not be installed on the cluster.
Storage requirements
Your cluster must have a default storage class that supports dynamic provisioning. To confirm that a default storage class is configured, run the following command:
oc get storageclass
oc get storageclassCopy to Clipboard Copied! Toggle word wrap Toggle overflow If no storage class is marked as the default, see Changing the default storage class in the OpenShift Container Platform documentation.
Identity provider configuration
- An identity provider must be configured for your OpenShift cluster, which provides authentication for OpenShift AI. See Understanding identity provider configuration.
You must access the cluster as a user with the
cluster-adminrole; thekubeadminuser is not allowed. For more information, see the relevant documentation:- OpenShift Container Platform: Creating a cluster admin
- OpenShift Dedicated: Managing OpenShift Dedicated administrators
- ROSA: Creating a cluster administrator user for quick cluster access
Internet access
Along with internet access, the following domains must be accessible during the installation of OpenShift AI:
-
cdn.redhat.com -
subscription.rhn.redhat.com -
registry.access.redhat.com -
registry.redhat.io -
quay.io
-
For environments that build or customize CUDA-based images using NVIDIA’s base images, or that directly pull artifacts from the NVIDIA NGC catalog, the following domains must also be accessible:
-
ngc.download.nvidia.cn -
developer.download.nvidia.com
-
Access to these NVIDIA domains is not required for standard OpenShift AI installations. The CUDA-based container images used by OpenShift AI are prebuilt and hosted on Red Hat’s registry at registry.redhat.io.
Object storage
- Several components of OpenShift AI require or can use S3-compatible object storage, such as AWS S3, MinIO, Ceph, or IBM Cloud Storage. Object storage provides HTTP-based access to data by using the S3 API, which is the standard interface for most object storage services.
Object storage is required for:
- Single-model serving platform, for storing and deploying models.
- AI pipelines, for storing artifacts, logs, and intermediate results.
Object storage can also be used by:
- Workbenches, for accessing large datasets.
- Kueue-based workloads, for reading input data and writing output results.
- Code executed inside pipelines, for persisting generated models or other artifacts.
Custom namespaces
-
By default, OpenShift AI uses predefined namespaces, but you can define custom namespaces for the Operator, applications, and workbenches if needed. Namespaces created by OpenShift AI typically include
openshiftorredhatin their name. Do not rename these system namespaces because they are required for OpenShift AI to function properly. - If you use custom namespaces, create and label them before installing the OpenShift AI Operator. See Configuring custom namespaces.
3.1.2. Component requirements Copy linkLink copied to clipboard!
Meet the requirements for the components and capabilities that you plan to use.
Workbenches (workbenches)
- To use a custom workbench namespace, create the namespace before installing the OpenShift AI Operator. See Configuring custom namespaces.
AI Pipelines (aipipelines)
- To store your pipeline artifacts in an S3-compatible object storage bucket so that you do not consume local storage, configure write access to your S3 bucket on your storage account.
- If your cluster is running in FIPS mode, any custom container images for data science pipelines must be based on UBI 9 or RHEL 9. This ensures compatibility with FIPS-approved pipeline components and prevents errors related to mismatched OpenSSL or GNU C Library (glibc) versions.
- To use your own Argo Workflows instance, after installing the OpenShift AI Operator see Configuring pipelines with your own Argo Workflows instance.
Kueue-based workloads (kueue, ray, trainingoperator)
- Install the Red Hat build of Kueue Operator.
- Install the cert-manager Operator.
- See Configuring workload management with Kueue and Installing the distributed workloads components.
Model serving platform (kserve)
- Install the cert-manager Operator.
Distributed Inference with llm-d (advanced kserve)
- Install the cert-manager Operator.
- Install the Red Hat Connectivity Link Operator.
- Install the Red Hat Leader Worker Set Operator.
- See Deploying models by using Distributed Inference with llm-d.
Llama Stack and RAG workloads (llamastackoperator)
- Install the Llama Stack Operator.
- Install the Red Hat OpenShift Service Mesh Operator 3.x.
- Install the cert-manager Operator.
- Ensure you have GPU-enabled nodes available on your cluster.
- Install the Node Feature Discovery Operator.
- Install the NVIDIA GPU Operator.
- Configure access to S3-compatible object storage for your model artifacts.
- See Working with Llama Stack.
Model registry (modelregistry)
- Configure access to an external MySQL database 5.x or later; 8.x is recommended.
- Configure access to S3-compatible object storage.
- See Creating a model registry.
3.2. Configuring custom namespaces Copy linkLink copied to clipboard!
By default, OpenShift AI uses the following predefined namespaces:
-
redhat-ods-operatorcontains the Red Hat OpenShift AI Operator -
redhat-ods-applicationsincludes the dashboard and other required components of OpenShift AI -
rhods-notebooksis where basic workbenches are deployed by default
If needed, you can define custom namespaces to use instead of the predefined ones before installing OpenShift AI. This flexibility supports environments with naming policies or conventions and allows cluster administrators to control where components such as workbenches are deployed.
Namespaces created by OpenShift AI typically include openshift or redhat in their name. Do not rename these system namespaces because they are required for OpenShift AI to function properly.
Prerequisites
- You have access to an OpenShift AI cluster with cluster administrator privileges.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have not yet installed the Red Hat OpenShift AI Operator.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI (
oc) as shown in the following example:oc login <openshift_cluster_url> -u <admin_username> -p <password>
oc login <openshift_cluster_url> -u <admin_username> -p <password>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: To configure a custom operator namespace:
Create a namespace YAML file named
operator-namespace.yaml.apiVersion: v1 kind: Namespace metadata: name: <operator-namespace>
apiVersion: v1 kind: Namespace metadata: name: <operator-namespace>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the operator namespace.
Create the namespace in your OpenShift cluster.
oc create -f operator-namespace.yaml
$ oc create -f operator-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
namespace/<operator-namespace> created
namespace/<operator-namespace> createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
When you install the Red Hat OpenShift AI Operator, use this namespace instead of
redhat-ods-operator.
Optional: To configure a custom applications namespace:
Create a namespace YAML file named
applications-namespace.yaml.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the namespace in your OpenShift cluster.
oc create -f applications-namespace.yaml
$ oc create -f applications-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
namespace/<applications-namespace> created
namespace/<applications-namespace> createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: To configure a custom workbench namespace:
Create a namespace YAML file named
workbench-namespace.yaml.apiVersion: v1 kind: Namespace metadata: name: <workbench-namespace>
apiVersion: v1 kind: Namespace metadata: name: <workbench-namespace>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the workbench namespace.
Create the namespace in your OpenShift cluster.
oc create -f workbench-namespace.yaml
$ oc create -f workbench-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
namespace/<workbench-namespace> created
namespace/<workbench-namespace> createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
When you install the Red Hat OpenShift AI components, specify this namespace for the
spec.workbenches.workbenchNamespacefield. You cannot change the default workbench namespace after you have installed the Red Hat OpenShift AI Operator.
3.3. Installing the Red Hat OpenShift AI Operator Copy linkLink copied to clipboard!
This section shows how to install the Red Hat OpenShift AI Operator on your OpenShift cluster using the command-line interface (CLI) and the OpenShift web console.
If your OpenShift cluster uses a proxy to access the Internet, you can configure the proxy settings for the Red Hat OpenShift AI Operator. See Overriding proxy settings of an Operator for more information.
3.3.1. Installing the Red Hat OpenShift AI Operator by using the CLI Copy linkLink copied to clipboard!
The following procedure shows how to use the OpenShift CLI (oc) to install the Red Hat OpenShift AI Operator on your OpenShift cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.19 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
If you are using custom namespaces, you have created and labeled them as required.
NoteThe example commands in this procedure use the predefined operator namespace. If you are using a custom operator namespace, replace
redhat-ods-operatorwith your namespace.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in your terminal.
oc login --token=<token> --server=<openshift_cluster_url>
$ oc login --token=<token> --server=<openshift_cluster_url>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a namespace for installation of the Operator by performing the following actions:
NoteIf you have already created a custom namespace for the Operator, you can skip this step.
Create a namespace YAML file named
rhods-operator-namespace.yaml.apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator
apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the operator namespace.
Create the namespace in your OpenShift cluster.
oc create -f rhods-operator-namespace.yaml
$ oc create -f rhods-operator-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
namespace/redhat-ods-operator created
namespace/redhat-ods-operator createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create an operator group for installation of the Operator by performing the following actions:
Create an
OperatorGroupobject custom resource (CR) file, for example,rhods-operator-group.yaml.apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: rhods-operator namespace: redhat-ods-operator
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: rhods-operator namespace: redhat-ods-operator1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the operator namespace.
Create the
OperatorGroupobject in your OpenShift cluster.oc create -f rhods-operator-group.yaml
$ oc create -f rhods-operator-group.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
operatorgroup.operators.coreos.com/rhods-operator created
operatorgroup.operators.coreos.com/rhods-operator createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a subscription for installation of the Operator by performing the following actions:
Create a
Subscriptionobject CR file, for example,rhods-operator-subscription.yaml.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the operator namespace.
- 2
- Sets the update channel. You must specify a value of
fast,fast-x.y,stable,stable-x.yeus-x.y, oralpha. For more information, see Understanding update channels. - 3
- Optional: Sets the operator version. If you do not specify a value, the subscription defaults to the latest operator version. For more information, see the Red Hat OpenShift AI Self-Managed Life Cycle Knowledgebase article.
Create the
Subscriptionobject in your OpenShift cluster to install the Operator.oc create -f rhods-operator-subscription.yaml
$ oc create -f rhods-operator-subscription.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
subscription.operators.coreos.com/rhods-operator created
subscription.operators.coreos.com/rhods-operator createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
In the OpenShift web console, click Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:
- Installing - installation is in progress; wait for this to change to Succeeded. This might take several minutes.
- Succeeded - installation is successful.
3.3.2. Installing the Red Hat OpenShift AI Operator by using the web console Copy linkLink copied to clipboard!
The following procedure shows how to use the OpenShift web console to install the Red Hat OpenShift AI Operator on your cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.19 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
- If you are using custom namespaces, you have created and labeled them as required.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → OperatorHub.
- On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing Red Hat OpenShift AI into the Filter by keyword box.
- Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.
- Select a Channel. For information about subscription update channels, see Understanding update channels.
- Select a Version.
- Click Install. The Install Operator page opens.
- Review or change the selected channel and version as needed.
- For Installation mode, note that the only available value is All namespaces on the cluster (default). This installation mode makes the Operator available to all namespaces in the cluster.
For Installed Namespace, choose one of the following options:
- To use the predefined operator namespace, select the Operator recommended Namespace: redhat-ods-operator option.
- To use the custom operator namespace that you created, select the Select a Namespace option, and then select the namespace from the drop-down list.
For Update approval, select one of the following update strategies:
- Automatic: New updates in the update channel are installed as soon as they become available.
Manual: A cluster administrator must approve any new updates before installation begins.
ImportantBy default, the Red Hat OpenShift AI Operator follows a sequential update process. This means that if there are several versions between the current version and the target version, Operator Lifecycle Manager (OLM) upgrades the Operator to each of the intermediate versions before it upgrades it to the final, target version.
If you configure automatic upgrades, OLM automatically upgrades the Operator to the latest available version. If you configure manual upgrades, a cluster administrator must manually approve each sequential update between the current version and the final, target version.
For information about supported versions, see the Red Hat OpenShift AI Life Cycle Knowledgebase article.
Click Install.
The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.
Verification
In the OpenShift web console, click Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:
- Installing - installation is in progress; wait for this to change to Succeeded. This might take several minutes.
- Succeeded - installation is successful.
3.4. Installing and managing Red Hat OpenShift AI components Copy linkLink copied to clipboard!
You can use the OpenShift command-line interface (CLI) or OpenShift web console to install and manage components of Red Hat OpenShift AI on your OpenShift cluster.
3.4.1. Installing Red Hat OpenShift AI components by using the CLI Copy linkLink copied to clipboard!
To install Red Hat OpenShift AI components by using the OpenShift CLI (oc), you must create and configure a DataScienceCluster object.
The following procedure describes how to create and configure a DataScienceCluster object to install Red Hat OpenShift AI components as part of a new installation.
For information about changing the installation status of OpenShift AI components after installation, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- If you are using custom namespaces, you have created the namespaces.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in your terminal.
oc login --token=<token> --server=<openshift_cluster_url>
$ oc login --token=<token> --server=<openshift_cluster_url>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a
DataScienceClusterobject custom resource (CR) file, for example,rhods-operator-dsc.yaml.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- To use your own Argo Workflows instance with the
aipipelinescomponent, setargoWorkflowsControllers.managementStatetoRemoved. This allows you to integrate with a managed Argo Workflows installation already on your OpenShift cluster and avoid conflicts with the embedded controller. See Configuring pipelines with your own Argo Workflows instance. - 2
- To use the predefined workbench namespace, set this value to
rhods-notebooksor omit this line. To use a custom workbench namespace, set this value to your namespace.
In the
spec.componentssection of the CR, for each OpenShift AI component shown, set the value of themanagementStatefield to eitherManagedorRemoved. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the distributed workloads components, see Installing the distributed workloads components.
Create the
DataScienceClusterobject in your OpenShift cluster to install the specified OpenShift AI components.oc create -f rhods-operator-dsc.yaml
$ oc create -f rhods-operator-dsc.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow You see output similar to the following:
datasciencecluster.datasciencecluster.opendatahub.io/default created
datasciencecluster.datasciencecluster.opendatahub.io/default createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Confirm that there is at least one running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applications. - In the applications namespace, confirm that there are one or more running pods for each of the OpenShift AI components that you installed.
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
- Click the Data Science Cluster tab.
For the
DataScienceClusterobject calleddefault-dsc, verify that the status isPhase: Ready.NoteWhen you edit the
spec.componentssection to change the installation status of a component, thedefault-dscstatus also changes. During the initial installation, it might take a few minutes for the status phase to change fromProgressingtoReady. You can access the OpenShift AI dashboard before thedefault-dscstatus phase isReady, but all components might not be ready.-
Click the
default-dsclink to display the data science cluster details. - Select the YAML tab.
In the
status.installedComponentssection, confirm that the components you installed have a status value oftrue.NoteIf a component shows with the
component-name: {}format in thespec.componentssection of the CR, the component is not installed.
- In the OpenShift AI dashboard, users can view the list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components, as described in Viewing installed OpenShift AI components.
Next steps
- If you are using OpenStack, CodeReady Containers (CRC), or other private cloud environments without integrated external DNS, manually configure DNS A or CNAME records after the LoadBalancer IP becomes available. For more information, see Configuring External DNS for RHOAI 3.x on OpenStack and Private Clouds.
- Complete any additional configuration required for the components you enabled. See the component-specific configuration sections for details.
3.4.2. Installing Red Hat OpenShift AI components by using the web console Copy linkLink copied to clipboard!
To install Red Hat OpenShift AI components by using the OpenShift web console, you must create and configure a DataScienceCluster object.
The following procedure describes how to create and configure a DataScienceCluster object to install Red Hat OpenShift AI components as part of a new installation.
- For information about changing the installation status of OpenShift AI components after installation, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
- If you are using custom namespaces, you have created the namespaces.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the Data Science Cluster tab.
- Click Create DataScienceCluster.
For Configure via, select YAML view.
An embedded YAML editor opens showing a default custom resource (CR) for the
DataScienceClusterobject, similar to the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- To use your own Argo Workflows instance with the
aipipelinescomponent, setargoWorkflowsControllers.managementStatetoRemoved. This allows you to integrate with a managed Argo Workflows installation already on your OpenShift cluster and avoid conflicts with the embedded controller. See Configuring pipelines with your own Argo Workflows instance. - 2
- To use the predefined workbench namespace, set this value to
rhods-notebooksor omit this line. To use a custom workbench namespace, set this value to your namespace.
In the
spec.componentssection of the CR, for each OpenShift AI component shown, set the value of themanagementStatefield to eitherManagedorRemoved. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to fully install the KServe component, which is used by the single-model serving platform to serve large models, see Installing the model serving platform.
- To learn how to install the distributed workloads components, see Installing the distributed workloads components.
- Click Create.
Verification
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
- Click the Data Science Cluster tab.
For the
DataScienceClusterobject calleddefault-dsc, verify that the status isPhase: Ready.NoteWhen you edit the
spec.componentssection to change the installation status of a component, thedefault-dscstatus also changes. During the initial installation, it might take a few minutes for the status phase to change fromProgressingtoReady. You can access the OpenShift AI dashboard before thedefault-dscstatus phase isReady, but all components might not be ready.-
Click the
default-dsclink to display the data science cluster details. - Select the YAML tab.
In the
status.installedComponentssection, confirm that the components you installed have a status value oftrue.NoteIf a component shows with the
component-name: {}format in thespec.componentssection of the CR, the component is not installed.
Confirm that there is at least one running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applicationsor your custom applications namespace. - In the applications namespace, confirm that there are one or more running pods for each of the OpenShift AI components that you installed.
- In the OpenShift AI dashboard, users can view the list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components, as described in Viewing installed OpenShift AI components.
Next steps
- If you are using OpenStack, CodeReady Containers (CRC), or other private cloud environments without integrated external DNS, manually configure DNS A or CNAME records after the LoadBalancer IP becomes available. For more information, see Configuring External DNS for RHOAI 3.x on OpenStack and Private Clouds.
- Complete any additional configuration required for the components you enabled. See the component-specific configuration sections for details.
3.4.3. Updating the installation status of Red Hat OpenShift AI components by using the web console Copy linkLink copied to clipboard!
You can use the OpenShift web console to update the installation status of components of Red Hat OpenShift AI on your OpenShift cluster.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the Data Science Cluster tab.
-
On the DataScienceClusters page, click the
default-dscobject. Click the YAML tab.
An embedded YAML editor opens showing the default custom resource (CR) for the
DataScienceClusterobject, similar to the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the
spec.componentssection of the CR, for each OpenShift AI component shown, set the value of themanagementStatefield to eitherManagedorRemoved. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Installing the model serving platform.
- To learn how to install the distributed workloads feature, see Installing the distributed workloads components.
Click Save.
For any components that you updated, OpenShift AI initiates a rollout that affects all pods to use the updated image.
If you are upgrading from OpenShift AI 2.19 or earlier, upgrade the Authorino Operator to the
stableupdate channel, version 1.2.1 or later.-
Update Authorino to the latest available release in the
tech-preview-v1channel (1.1.2), if you have not done so already. Switch to the
stablechannel:- Navigate to the Subscription settings of the Authorino Operator.
- Under Update channel, click on the highlighted tech-preview-v1.
-
Change the channel to
stable.
- Select the update option for Authorino 1.2.1.
-
Update Authorino to the latest available release in the
Verification
Confirm that there is at least one running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applicationsor your custom applications namespace. - In the applications namespace, confirm that there are one or more running pods for each of the OpenShift AI components that you installed.
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceClusterobject calleddefault-dsc. - Select the YAML tab.
In the
status.installedComponentssection, confirm that the components you installed have a status value oftrue.NoteIf a component shows with the
component-name: {}format in thespec.componentssection of the CR, the component is not installed.
- In the OpenShift AI dashboard, users can view the list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components, as described in Viewing installed OpenShift AI components.
3.4.4. Viewing installed OpenShift AI components Copy linkLink copied to clipboard!
In the Red Hat OpenShift AI dashboard, you can view a list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components.
Prerequisites
- OpenShift AI is installed in your OpenShift cluster.
Procedure
- Log in to the OpenShift AI dashboard.
-
In the top navigation bar, click the help icon (
) and then select About.
Verification
The About page shows a list of the installed OpenShift AI components along with their corresponding upstream components and upstream component versions.
Additional resources
Chapter 4. Configuring pipelines with your own Argo Workflows instance Copy linkLink copied to clipboard!
You can configure OpenShift AI to use an existing Argo Workflows instance instead of the embedded one included with AI pipelines. This configuration is useful if your OpenShift cluster already includes a managed Argo Workflows instance and you want to integrate it with OpenShift AI pipelines without conflicts. Disabling the embedded Argo Workflows controller allows cluster administrators to manage the lifecycles of OpenShift AI and Argo Workflows independently.
You cannot enable both the embedded Argo Workflows instance and your own Argo Workflows instance on the same cluster.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have installed Red Hat OpenShift AI.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the OpenShift console, click Operators → Installed Operators.
- Search for the Red Hat OpenShift AI Operator, and then click the Operator name to open the Operator details page.
- Click the Data Science Cluster tab.
- Click the default instance name (for example, default-dsc) to open the instance details page.
- Click the YAML tab to show the instance specifications.
Disable the embedded Argo Workflows controllers that are managed by the OpenShift AI Operator:
-
In the
spec.componentssection, set the value of themanagementStatefield for theaipipelinescomponent toManaged. In the
spec.components.aipipelinessection, set the value of themanagementStatefield forargoWorkflowsControllerstoRemoved, as shown in the following example:Example aipipelines specification
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
In the
- Click Save to apply your changes.
- Install and configure a compatible version of Argo Workflows on your cluster. For compatible version information, see Supported Configurations for 3.x. For installation information, see the Argo Workflows Installation documentation.
Verification
-
On the Details tab of the
DataScienceClusterinstance (for example, default-dsc), verify thatAIPipelinesReadyhas a Status ofTrue. Verify that the
ds-pipeline-workflow-controllerpod does not exist:- Go to Workloads → Pods.
-
Search for the
ds-pipeline-workflow-controllerpod. - Verify that this pod does not exist. The absence of this pod confirms that the embedded Argo Workflows controller is disabled.
Chapter 5. Installing the distributed workloads components Copy linkLink copied to clipboard!
To use the distributed workloads feature in OpenShift AI, you must install several components.
Prerequisites
-
You have logged in to OpenShift with the
cluster-adminrole and you can access the data science cluster. - You have installed Red Hat OpenShift AI.
- You have installed the Red Hat build of Kueue Operator on your OpenShift cluster, as described in the Red Hat build of Kueue documentation.
- You have sufficient resources. In addition to the minimum OpenShift AI resources described in Installing and deploying OpenShift AI (for disconnected environments, see Deploying OpenShift AI in a disconnected environment), you need 1.6 vCPU and 2 GiB memory to deploy the distributed workloads infrastructure.
- You have installed the cert-manager Operator in OpenShift by using the web console as described in Installing the cert-manager Operator for Red Hat OpenShift.
If you want to use graphics processing units (GPUs), you have enabled GPU support in OpenShift AI. If you use NVIDIA GPUs, see Enabling NVIDIA GPUs. If you use AMD GPUs, see AMD GPU integration.
NoteIn OpenShift AI, Red Hat supports the use of accelerators within the same cluster only.
Starting from Red Hat OpenShift AI 2.19, Red Hat supports remote direct memory access (RDMA) for NVIDIA GPUs only, enabling them to communicate directly with each other by using NVIDIA GPUDirect RDMA across either Ethernet or InfiniBand networks.
If you want to use self-signed certificates, you have added them to a central Certificate Authority (CA) bundle as described in Working with certificates (for disconnected environments, see Working with certificates). No additional configuration is necessary to use those certificates with distributed workloads. The centrally configured self-signed certificates are automatically available in the workload pods at the following mount points:
Cluster-wide CA bundle:
/etc/pki/tls/certs/odh-trusted-ca-bundle.crt /etc/ssl/certs/odh-trusted-ca-bundle.crt
/etc/pki/tls/certs/odh-trusted-ca-bundle.crt /etc/ssl/certs/odh-trusted-ca-bundle.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Custom CA bundle:
/etc/pki/tls/certs/odh-ca-bundle.crt /etc/ssl/certs/odh-ca-bundle.crt
/etc/pki/tls/certs/odh-ca-bundle.crt /etc/ssl/certs/odh-ca-bundle.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Procedure
- In the OpenShift console, click Operators → Installed Operators.
- Search for the Red Hat OpenShift AI Operator, and then click the Operator name to open the Operator details page.
- Click the Data Science Cluster tab.
- Click the default instance name (for example, default-dsc) to open the instance details page.
- Click the YAML tab to show the instance specifications.
Enable the required distributed workloads components. In the
spec.componentssection, set themanagementStatefield correctly for the required components:-
Set
kueuetoUnmanagedto allow the Red Hat build of Kueue Operator to manage Kueue. -
If you want to use the Ray framework to tune models, set
raytoManaged. -
If you want to use the Kubeflow Training Operator to tune models, set
trainingoperatortoManaged. - The list of required components depends on whether the distributed workload is run from a pipeline or workbench or both, as shown in the following table.
Expand Table 5.1. Components required for distributed workloads Component Pipelines only Workbenches only Pipelines and workbenches dashboardManagedManagedManagedaipipelinesManagedRemovedManagedkueueUnmanagedUnmanagedUnmanagedrayManagedManagedManagedtrainingoperatorManagedManagedManagedworkbenchesRemovedManagedManaged-
Set
-
Click Save. After a short time, the components with a
Managedstate are ready.
Verification
Check the status of the kubeflow-training-operator, kuberay-operator, kueue-controller-manager, and openshift-kueue-operator pods, as follows:
- In the OpenShift console, click Workloads → Deployments.
In the Search by name field, enter the following search strings:
- In the redhat-ods-applications project, search for kubeflow-training-operator and kuberay-operator.
- In the openshift-kueue-operator project, search for kueue-controller-manager and openshift-kueue-operator.
In each case, check the status as follows:
- Click the deployment name to open the deployment details page.
- Click the Pods tab.
Check the pod status.
When the status of the pods is Running, the pods are ready to use.
- To see more information about each pod, click the pod name to open the pod details page, and then click the Logs tab.
Next Step
Configure the distributed workloads feature as described in Managing distributed workloads.
Chapter 6. Accessing the dashboard Copy linkLink copied to clipboard!
After you have installed OpenShift AI and added users, you can access the URL for your OpenShift AI console and share the URL with the users to let them log in and work on their models.
Prerequisites
- You have installed OpenShift AI on your OpenShift cluster.
- You have added at least one user to the user group for OpenShift AI.
Procedure
- Log in to OpenShift web console.
-
Click the application launcher (
).
- Right-click Red Hat OpenShift AI and copy the URL for your OpenShift AI instance.
- Provide this instance URL to your data scientists to let them log in to OpenShift AI.
Verification
- Confirm that you and your users can log in to OpenShift AI by using the instance URL.
Note: In the Red Hat OpenShift AI dashboard, users can view the list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components, as described in Viewing installed components.
Chapter 7. Enabling accelerators Copy linkLink copied to clipboard!
Before you can use an accelerator in OpenShift AI, you must install the relevant software components. The installation process varies based on the accelerator type.
Prerequisites
- You have logged in to your OpenShift cluster.
-
You have the
cluster-adminrole in your OpenShift cluster. - You have installed an accelerator and confirmed that it is detected in your environment.
Procedure
Follow the appropriate documentation to enable your accelerator:
- NVIDIA GPUs: See Enabling NVIDIA GPUs.
- Intel Gaudi AI accelerators: See Enabling Intel Gaudi AI accelerators.
- AMD GPUs: See Enabling AMD GPUs.
- After installing your accelerator, create a hardware profile as described in: Working with hardware profiles.
Verification
From the Administrator perspective, go to the Operators → Installed Operators page. Confirm that the following Operators appear:
- The Operator for your accelerator
- Node Feature Discovery (NFD)
- Kernel Module Management (KMM)
The accelerator is correctly detected a few minutes after full installation of the Node Feature Discovery (NFD) and the relevant accelerator Operator. The OpenShift CLI (
oc) displays the appropriate output for the GPU worker node. For example, here is output confirming that an NVIDIA GPU is detected:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 8. Working with certificates Copy linkLink copied to clipboard!
When you install Red Hat OpenShift AI, OpenShift automatically applies a default Certificate Authority (CA) bundle to manage authentication for most OpenShift AI components, such as workbenches and model servers. These certificates are trusted self-signed certificates that help secure communication. However, as a cluster administrator, you might need to configure additional self-signed certificates to use some components, such as the AI pipeline server and object storage solutions. If an OpenShift AI component uses a self-signed certificate that is not part of the existing cluster-wide CA bundle, you have the following options for including the certificate:
- Add it to the OpenShift cluster-wide CA bundle.
- Add it to a custom CA bundle, separate from the cluster-wide CA bundle.
As a cluster administrator, you can also change how to manage authentication for OpenShift AI as follows:
- Manually manage certificate changes, instead of relying on the OpenShift AI Operator to handle them automatically.
- Remove the cluster-wide CA bundle, either from all namespaces or specific ones. If you prefer to implement a different authentication approach, you can override the default OpenShift AI behavior, as described in Removing the CA bundle.
8.1. Understanding how OpenShift AI handles certificates Copy linkLink copied to clipboard!
After installing OpenShift AI, the Red Hat OpenShift AI Operator automatically creates an empty odh-trusted-ca-bundle configuration file (ConfigMap). The Cluster Network Operator (CNO) injects the cluster-wide CA bundle into the odh-trusted-ca-bundle configMap with the label "config.openshift.io/inject-trusted-cabundle".
After the CNO operator injects the bundle, it updates the ConfigMap with the contents of the ca-bundle.crt file.
The management of CA bundles is configured through the Data Science Cluster Initialization (DSCI) object. Within this object, you can set the spec.trustedCABundle.managementState field to one of the following values:
-
Managed: (Default) The Red Hat OpenShift AI Operator manages theodh-trusted-ca-bundleConfigMap and adds it to all non-reserved existing and new namespaces. It does not add the ConfigMap to any reserved or system namespaces, such asdefault,openshift-\*orkube-*. The Red Hat OpenShift AI Operator automatically updates the ConfigMap to reflect any changes made to thecustomCABundlefield. Unmanaged: The Red Hat OpenShift AI administrator manually manages theodh-trusted-ca-bundleConfigMap, instead of allowing the Operator to manage it. Changing themanagementStatefromManagedtoUnmanageddoes not remove theodh-trusted-ca-bundleConfigMap. However, the ConfigMap is no longer automatically updated if changes are made to thecustomCABundlefield.The
Unmanagedsetting is useful if your organization implements a different method for managing trusted CA bundles, such as Ansible automation, and does not want the Red Hat OpenShift AI Operator to handle certificates automatically. This setting provides greater control, preventing the Operator from overwriting custom configurations.Removed: The Red Hat OpenShift AI Operator removes theodh-trusted-ca-bundleConfigMap, if present, and prevents ConfigMaps from being created in new namespaces. Changing this field fromManagedtoRemovedalso deletes the ConfigMap from existing namespaces. This is the default value after upgrading Red Hat OpenShift AI from 2.7 or earlier versions to 3.2.The
Removedsetting reduces complexity and mitigates security risks, such as unauthorized certificate changes. In high-security environments, removing the CA bundle ensures that only approved CAs are trusted, reducing the risk of cyberattacks. For example, your organization might want to restrict cluster administrators from creating trusted CA bundles to prevent OpenShift pods from communicating externally.
8.2. Adding certificates Copy linkLink copied to clipboard!
If you must use a self-signed certificate that is not part of the existing cluster-wide CA bundle, you have two options for configuring the certificate:
Add it to the cluster-wide CA bundle.
This option is useful when the certificate is needed for secure communication across multiple services or when it’s required by security policies to be trusted cluster-wide. This option ensures that all services and components in the cluster trust the certificate automatically. It simplifies management because the certificate is trusted across the entire cluster, avoiding the need to configure the certificate separately for each service.
Add it to a custom CA bundle that is separate from the OpenShift cluster-wide bundle.
Consider this option for the following scenarios:
- Limit scope: Only specific services need the certificate, not the whole cluster.
- Isolation: Keeps custom certificates separate, preventing changes to the global configuration.
- Avoid global impact: Does not affect services that do not need the certificate.
- Easier management: Makes it simpler to manage certificates for specific services.
8.3. Adding certificates to a cluster-wide CA bundle Copy linkLink copied to clipboard!
You can add a self-signed certificate to a cluster-wide Certificate Authority (CA) bundle (ca-bundle.crt).
When the cluster-wide CA bundle is updated, the Cluster Network Operator (CNO) automatically detects the change and injects the updated bundle into the odh-trusted-ca-bundle ConfigMap, making the certificate available to OpenShift AI components.
Note: By default, the management state for the Trusted CA bundle is Managed (that is, the spec.trustedCABundle.managementState field in the Red Hat OpenShift AI Operator’s DSCI object is set to Managed). If you change this setting to Unmanaged, you must manually update the odh-trusted-ca-bundle ConfigMap to include the updated cluster-wide CA bundle.
Alternatively, you can add certificates to a custom CA bundle, as described in Adding certificates to a custom CA bundle.
Prerequisites
-
You have created a self-signed certificate and saved the certificate to a file. For example, you have created a certificate using OpenSSL and saved it to a file named
example-ca.crt. - You have cluster administrator access for the OpenShift cluster where Red Hat OpenShift AI is installed.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
Create a ConfigMap that includes the root CA certificate used to sign the certificate, where
</path/to/example-ca.crt>is the path to the CA certificate bundle on your local file system:oc create configmap custom-ca \ --from-file=ca-bundle.crt=</path/to/example-ca.crt> \ -n openshift-config
oc create configmap custom-ca \ --from-file=ca-bundle.crt=</path/to/example-ca.crt> \ -n openshift-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the cluster-wide proxy configuration with the newly-created ConfigMap:
oc patch proxy/cluster \ --type=merge \ --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'oc patch proxy/cluster \ --type=merge \ --patch='{"spec":{"trustedCA":{"name":"custom-ca"}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Run the following command to verify that all non-reserved namespaces contain the odh-trusted-ca-bundle ConfigMap:
oc get configmaps --all-namespaces -l app.kubernetes.io/part-of=opendatahub-operator | grep odh-trusted-ca-bundle
oc get configmaps --all-namespaces -l app.kubernetes.io/part-of=opendatahub-operator | grep odh-trusted-ca-bundle
Additional resources
- Configuring certificates in the OpenShift Container Platform documentation
- Injecting a custom CA Bundle in the Red Hat OpenShift Service on AWS documentation
- Injecting a custom CA Bundle in the OpenShift Dedicated documentation
8.4. Adding certificates to a custom CA bundle Copy linkLink copied to clipboard!
You can add self-signed certificates to a custom CA bundle that is separate from the OpenShift cluster-wide bundle.
This method is ideal for scenarios where components need access to external resources that require a self-signed certificate. For example, you might need to add self-signed certificates to grant AI pipelines access to S3-compatible object storage.
Prerequisites
-
You have created a self-signed certificate and saved the certificate to a file. For example, you have created a certificate using OpenSSL and saved it to a file named
example-ca.crt. - You have cluster administrator access for the OpenShift cluster where Red Hat OpenShift AI is installed.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
- Log in to OpenShift.
- Click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec.trustedCABundlesection, add the custom certificate to thecustomCABundlefield, as shown in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save.
The Red Hat OpenShift AI Operator automatically updates the ConfigMap to reflect any changes made to the customCABundle field. It adds the odh-ca-bundle.crt file containing the certificates to the odh-trusted-ca-bundle ConfigMap, as shown in the following example:
Verification
Run the following command to verify that a non-reserved namespace contains the odh-trusted-ca-bundle ConfigMap and that the ConfigMap contains your customCABundle value. In the following command, example-namespace is the non-reserved namespace and examplebundle123 is the customCABundle value.
oc get configmap odh-trusted-ca-bundle -n example-namespace -o yaml | grep examplebundle123
oc get configmap odh-trusted-ca-bundle -n example-namespace -o yaml | grep examplebundle123
8.5. Using self-signed certificates with OpenShift AI components Copy linkLink copied to clipboard!
Some OpenShift AI components have additional options or required configuration for self-signed certificates.
8.5.1. Accessing S3-compatible object storage with self-signed certificates Copy linkLink copied to clipboard!
To securely connect OpenShift AI components to object storage solutions or databases that are deployed within an OpenShift cluster that uses self-signed certificates, you must provide a certificate authority (CA) certificate. Each namespace includes a ConfigMap named kube-root-ca.crt, which contains the CA certificate of the internal API Server.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have deployed an object storage solution or database in your OpenShift cluster.
Procedure
In a terminal window, log in to the OpenShift CLI (
oc) as shown in the following example:oc login api.<cluster_name>.<cluster_domain>:6443 --web
oc login api.<cluster_name>.<cluster_domain>:6443 --webCopy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the current OpenShift AI trusted CA configuration and store it in a new file:
oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq -r '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crt
oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq -r '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the cluster’s
kube-root-ca.crtConfigMap to the OpenShift AI trusted CA configuration:oc get configmap kube-root-ca.crt -o jsonpath="{['data']['ca\.crt']}" >> /tmp/my-custom-ca-bundles.crtoc get configmap kube-root-ca.crt -o jsonpath="{['data']['ca\.crt']}" >> /tmp/my-custom-ca-bundles.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the OpenShift AI trusted CA configuration to trust certificates issued by the certificate authorities in
kube-root-ca.crt:oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- You can successfully deploy components that are configured to use object storage solutions or databases that are deployed in the OpenShift cluster. For example, a pipeline server that is configured to use a database deployed in the cluster starts successfully.
You can verify your new certificate configuration by following the steps in the OpenShift AI tutorial - Fraud Detection example. Run the script to install local object storage buckets and create connections, and then enable AI pipelines.
For more information about running the script to install local object storage buckets, see Running a script to install local object storage buckets and create connections.
For more information about enabling AI pipelines, see Enabling pipelines.
8.5.2. Configuring a certificate for pipelines Copy linkLink copied to clipboard!
By default, OpenShift AI includes OpenShift cluster-wide certificates in the odh-trusted-ca-bundle ConfigMap. These cluster-wide certificates cover most components, such as workbenches and model servers. However, the pipeline server might require additional Certificate Authority (CA) configuration, especially when interacting with external systems that use self-signed or custom certificates.
You have the following options for adding the certificate for AI pipelines:
- Add them to the cluster-wide CA bundle, as described in Adding certificates to a cluster-wide CA bundle.
- Add them to a custom bundle as described in Adding certificates to a custom CA bundle.
- Provide a CA bundle that is only used for AI pipelines, as described in the following procedure.
Prerequisites
- You have cluster administrator access for the OpenShift cluster where Red Hat OpenShift AI is installed.
-
You have created a self-signed certificate and saved the certificate to a file. For example, you have created a certificate using OpenSSL and saved it to a file named
example-ca.crt. - You have configured an AI pipeline server.
Procedure
- Log in to the OpenShift console.
From Workloads → ConfigMaps, create a ConfigMap with the required bundle in the same project as the target AI pipeline:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following snippet to the
.spec.apiserver.caBundlefield of the underlyingDataSciencePipelinesApplication(DSPA):Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the ConfigMap. The pipeline server pod automatically redeploys with the updated bundle.
Verification
Confirm that your CA bundle was successfully mounted:
- Log in to the OpenShift console.
- Go to the project that has the target AI pipeline.
- Click the Pods tab.
-
Click the pipeline server pod with the
ds-pipeline-dspa-<hash>prefix. - Click Terminal.
-
Enter
cat /dsp-custom-certs/dsp-ca.crt. - Verify that your CA bundle is present within this file.
8.5.3. Configuring a certificate for workbenches Copy linkLink copied to clipboard!
By default, self-signed certificates apply to workbenches that you create after configuring cluster-wide certificates. To apply cluster-wide certificates to an existing workbench, stop and then restart the workbench.
Self-signed certificates are stored in /etc/pki/tls/custom-certs/ca-bundle.crt. Workbenches use a preset environment variable that many popular HTTP client packages point to for certificates. For packages that are not included by default, you can provide this certificate path. For example, for the kfp package to connect to the AI pipeline server:
8.5.4. Using the cluster-wide CA bundle for the model serving platform Copy linkLink copied to clipboard!
By default, the model serving platform in OpenShift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a server.
If you have configured cluster-wide certificates on your OpenShift cluster, they are used by default for other types of endpoints, such as endpoints for routes.
The following procedure explains how to use the same certificate that you already have for your OpenShift cluster.
Prerequisites
- You have cluster administrator access for the OpenShift cluster where Red Hat OpenShift AI is installed.
- You have configured cluster-wide certificates in OpenShift.
- You have configured the model serving platform, as described in Installing the model serving platform.
Procedure
- Log in to the OpenShift console.
-
From the list of projects, open the
openshift-ingressproject. - Click YAML.
-
Search for "cert" to find a secret with a name that includes "cert". For example,
rhods-internal-primary-cert-bundle-secret. The contents of the secret should contain two items that are used for all OpenShift Routes:tls.cert(the certificate) andtls.key(the key). - Copy the reference to the secret.
-
From the list of projects, open the
istio-systemproject. -
Create a YAML file and paste the reference to the secret that you copied from the
openshift-ingressYAML file. Edit the YAML code to keep only the relevant content, as shown in the following example. Replace
rhods-internal-primary-cert-bundle-secretwith the name of your secret:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Save the YAML file in the
istio-systemproject. - Navigate to Operators → Installed Operators → Red Hat OpenShift AI.
- Click Data Science Cluster*, and then click default-dsc → YAML.
Edit the
kserveconfiguration section to refer to your secret as shown in the following example. Replacerhods-internal-primary-cert-bundle-secretwith the name of the secret that you created in Step 8.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6. Managing certificates without the Red Hat OpenShift AI Operator Copy linkLink copied to clipboard!
By default, the Red Hat OpenShift AI Operator manages the odh-trusted-ca-bundle ConfigMap, which contains the trusted CA bundle and is applied to all non-reserved namespaces in the cluster. The Operator automatically updates this ConfigMap whenever changes are made to the CA bundle.
If your organization prefers to manage trusted CA bundles independently, for example, by using Ansible automation, you can disable this default behavior to prevent automatic updates by the Red Hat OpenShift AI Operator.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- In the OpenShift web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
specsection, change the value of themanagementStatefield fortrustedCABundletoUnmanaged, as shown:spec: trustedCABundle: managementState: Unmanagedspec: trustedCABundle: managementState: UnmanagedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Click Save.
Changing the
managementStatefromManagedtoUnmanagedprevents automatic updates when thecustomCABundlefield is modified, but does not remove theodh-trusted-ca-bundleConfigMap.
Verification
In the
specsection, set the value of thecustomCABundlefield fortrustedCABundle, for example:spec: trustedCABundle: managementState: Unmanaged customCABundle: example123spec: trustedCABundle: managementState: Unmanaged customCABundle: example123Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save.
- Click Workloads → ConfigMaps.
- Select a project from the project list.
-
Click the
odh-trusted-ca-bundleConfigMap. -
Click the YAML tab and verify that the value of the
customCABundlefield did not update.
8.7. Removing the CA bundle Copy linkLink copied to clipboard!
If you prefer to implement a different authentication approach for your OpenShift AI installation, you can override the default behavior by removing the CA bundle.
You have two options for removing the CA bundle:
- Remove the CA bundle from all non-reserved projects in OpenShift AI.
- Remove the CA bundle from a specific project.
8.7.1. Removing the CA bundle from all namespaces Copy linkLink copied to clipboard!
You can remove a Certificate Authority (CA) bundle from all non-reserved namespaces in OpenShift AI. This process changes the default configuration and disables the creation of the odh-trusted-ca-bundle configuration file (ConfigMap), as described in Working with certificates (OpenShift AI Self-Managed) or Working with certificates (OpenShift AI Self-Managed in a disconnected environment).
The odh-trusted-ca-bundle ConfigMaps are only deleted from namespaces when you set the managementState of trustedCABundle to Removed; deleting the DSC Initialization does not delete the ConfigMaps.
To remove a CA bundle from a single namespace only, see Removing the CA bundle from a single namespace (OpenShift AI Self-Managed) or Removing the CA bundle from a single namespace (OpenShift AI Self-Managed in a disconnected environment).
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
- In the OpenShift web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
specsection, change the value of themanagementStatefield fortrustedCABundletoRemoved:spec: trustedCABundle: managementState: Removedspec: trustedCABundle: managementState: RemovedCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save.
Verification
Run the following command to verify that the
odh-trusted-ca-bundleConfigMap has been removed from all namespaces:oc get configmaps --all-namespaces | grep odh-trusted-ca-bundle
oc get configmaps --all-namespaces | grep odh-trusted-ca-bundleCopy to Clipboard Copied! Toggle word wrap Toggle overflow The command should not return any ConfigMaps.
8.7.2. Removing the CA bundle from a single namespace Copy linkLink copied to clipboard!
You can remove a custom Certificate Authority (CA) bundle from individual namespaces in OpenShift AI. This process disables the creation of the odh-trusted-ca-bundle configuration file (ConfigMap) for the specified namespace only.
To remove a CA bundle from all namespaces,Removing the CA bundle from all namespaces (OpenShift AI Self-Managed) or Removing the CA bundle from all namespaces (OpenShift AI Self-Managed in a disconnected environment).
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
Run the following command to remove a CA bundle from a namespace. In the following command, example-namespace is the non-reserved namespace.
oc annotate ns example-namespace security.opendatahub.io/inject-trusted-ca-bundle=false
oc annotate ns example-namespace security.opendatahub.io/inject-trusted-ca-bundle=falseCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Run the following command to verify that the CA bundle has been removed from the namespace. In the following command, example-namespace is the non-reserved namespace.
oc get configmap odh-trusted-ca-bundle -n example-namespace
oc get configmap odh-trusted-ca-bundle -n example-namespaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow The command should return
configmaps "odh-trusted-ca-bundle" not found.
Chapter 9. Viewing logs and audit records Copy linkLink copied to clipboard!
As a cluster administrator, you can use the OpenShift AI Operator logger to monitor and troubleshoot issues. You can also use OpenShift audit records to review a history of changes made to the OpenShift AI Operator configuration.
9.1. Configuring the OpenShift AI Operator logger Copy linkLink copied to clipboard!
You can change the log level for OpenShift AI Operator components by setting the .spec.devFlags.logmode flag for the DSC Initialization/DSCI custom resource during runtime. If you do not set a logmode value, the logger uses the INFO log level by default.
The log level that you set with .spec.devFlags.logmode applies to all components, not just those in a Managed state.
The following table shows the available log levels:
| Log level | Stacktrace level | Verbosity | Output | Timestamp type |
|---|---|---|---|---|
|
| WARN | INFO | Console | Epoch timestamps |
|
| ERROR | INFO | JSON | Human-readable timestamps |
|
| ERROR | INFO | JSON | Human-readable timestamps |
Logs that are set to devel or development generate in a plain text console format. Logs that are set to prod, production, or which do not have a level set generate in a JSON format.
Prerequisites
-
You have administrator access to the
DSCInitializationresources in the OpenShift cluster. You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
- Log in to the OpenShift as a cluster administrator.
- Click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
specsection, update the.spec.devFlags.logmodeflag with the log level that you want to set.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save.
You can also configure the log level from the OpenShift CLI (oc) by using the following command with the logmode value set to the log level that you want.
oc patch dsci default-dsci -p '{"spec":{"devFlags":{"logmode":"development"}}}' --type=merge
oc patch dsci default-dsci -p '{"spec":{"devFlags":{"logmode":"development"}}}' --type=merge
Verification
-
If you set the component log level to
develordevelopment, logs generate more frequently and include logs atWARNlevel and above. -
If you set the component log level to
prodorproduction, or do not set a log level, logs generate less frequently and include logs atERRORlevel or above.
9.1.1. Viewing the OpenShift AI Operator logs Copy linkLink copied to clipboard!
-
Log in to the OpenShift CLI (
oc). Run the following command to stream logs from all Operator pods:
for pod in $(oc get pods -l name=rhods-operator -n redhat-ods-operator -o name); do oc logs -f "$pod" -n redhat-ods-operator & done
for pod in $(oc get pods -l name=rhods-operator -n redhat-ods-operator -o name); do oc logs -f "$pod" -n redhat-ods-operator & doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow The Operator pod logs open in your terminal.
TipPress
Ctrl+Cto stop viewing. To fully stop all log streams, runkill $(jobs -p).
You can also view each Operator pod log in the OpenShift console by navigating to Workloads → Pods, selecting the redhat-ods-operator project, clicking a pod name, and then clicking the Logs tab.
9.2. Viewing audit records Copy linkLink copied to clipboard!
Cluster administrators can use OpenShift auditing to see changes made to the OpenShift AI Operator configuration by reviewing modifications to the DataScienceCluster (DSC) and DSCInitialization (DSCI) custom resources. Audit logging is enabled by default in standard OpenShift cluster configurations. For more information, see Viewing audit logs in the OpenShift documentation.
In Red Hat OpenShift Service on AWS, audit logging is disabled by default because the Elasticsearch log store does not provide secure storage for audit logs. To configure log forwarding, see Logging in the Red Hat OpenShift Service on AWS documentation.
The following example shows how to use the OpenShift audit logs to see the history of changes made (by users) to the DSC and DSCI custom resources.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:
oc login <openshift_cluster_url> -u <admin_username> -p <password>
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
To access the full content of the changed custom resources, set the OpenShift audit log policy to
WriteRequestBodiesor a more comprehensive profile. For more information, see Configuring the audit log policy. Fetch the audit log files that are available for the relevant control plane nodes. For example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Search the files for the DSC and DSCI custom resources. For example:
jq 'select((.objectRef.apiGroup == "dscinitialization.opendatahub.io" or .objectRef.apiGroup == "datasciencecluster.opendatahub.io") and .user.username != "system:serviceaccount:redhat-ods-operator:redhat-ods-operator-controller-manager" and .verb != "get" and .verb != "watch" and .verb != "list")' < /tmp/kube-apiserver-audit-opendatahub.logjq 'select((.objectRef.apiGroup == "dscinitialization.opendatahub.io" or .objectRef.apiGroup == "datasciencecluster.opendatahub.io") and .user.username != "system:serviceaccount:redhat-ods-operator:redhat-ods-operator-controller-manager" and .verb != "get" and .verb != "watch" and .verb != "list")' < /tmp/kube-apiserver-audit-opendatahub.logCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- The commands return relevant log entries.
To configure the log retention time, see the Logging section in the OpenShift documentation.
Chapter 10. Troubleshooting common installation problems Copy linkLink copied to clipboard!
If you are experiencing difficulties installing the Red Hat OpenShift AI Operator, read this section to understand what could be causing the problem and how to resolve it.
If the problem is not included here or in the release notes, contact Red Hat Support. When opening a support case, it is helpful to include debugging information about your cluster. You can collect this information by using the must-gather tool as described in Must-Gather for Red Hat OpenShift AI and Gathering data about your cluster.
You can also adjust the log level of OpenShift AI Operator components to increase or reduce log verbosity to suit your use case. For more information, see Configuring the OpenShift AI Operator logger.
10.1. The Red Hat OpenShift AI Operator cannot be retrieved from the image registry Copy linkLink copied to clipboard!
Problem
When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:
- The image registry is unavailable.
- There is a problem with your network connection.
- Your cluster is not operational and is therefore unable to retrieve the image registry.
Diagnosis
Check the logs in the Events section in OpenShift for further information about the Failure to pull from quay error message.
Resolution
- Contact Red Hat support.
10.2. OpenShift AI does not install on unsupported infrastructure Copy linkLink copied to clipboard!
Problem
You are deploying on an environment that is not documented as supported by the Red Hat OpenShift AI Operator.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Deploying on $infrastructure, which is not supported. Failing Installationerror message.
Resolution
- Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Supported Configurations for 3.x.
10.3. The creation of the OpenShift AI Custom Resource (CR) fails Copy linkLink copied to clipboard!
Problem
During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the ODH CR failed.error message.
Resolution
- Contact Red Hat support.
10.4. The creation of the OpenShift AI Notebooks Custom Resource (CR) fails Copy linkLink copied to clipboard!
Problem
During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RHODS Notebooks CR failed.error message.
Resolution
- Contact Red Hat support.
10.5. The OpenShift AI dashboard is not accessible Copy linkLink copied to clipboard!
Problem
After installing OpenShift AI, the redhat-ods-applications, redhat-ods-monitoring, and redhat-ods-operator project namespaces are Active but you cannot access the dashboard due to an error in one of the pods.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects.
Click Filter and select the checkbox for every status except Running and Completed.
The page displays the pods that have an error.
Resolution
- To see more information and troubleshooting steps for a pod, on the Pods page, click the link in the Status column for the pod.
- If the Status column does not display a link, click the pod name to open the pod details page and then click the Logs tab.
10.6. Reinstalling OpenShift AI fails with an error Copy linkLink copied to clipboard!
Problem
After uninstalling the OpenShift AI Operator and reinstalling it by using the CLI, the reinstallation fails with an unable to find DSCInitialization error in one of the OpenShift AI Operator pod logs. This issue can occur if the Auth custom resource from the previous installation was not deleted after uninstalling the OpenShift AI Operator and before reinstalling it. For more information, see Understanding the uninstallation process.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
Check the log for an error message similar to the following:
{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.{"name":"auth"},"namespace":"","name":"auth","reconcileID":"7bff53ae-1252-46fe-831a-fdc824078a1b","error":"unable to find DSCInitialization","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Resolution
- Uninstall the OpenShift AI Operator.
Delete the
Authcustom resource:- In the OpenShift web console, switch to the Administrator perspective.
- Click API Explorer.
-
From the All groups drop-down list, select or enter
services.platform.opendatahub.io. - Click the Auth kind.
- Click the Instances tab.
Click the action menu (⋮) and select Delete Auth.
The Delete Auth dialog appears.
- Click Delete.
- Install the OpenShift AI Operator again.
10.7. The dedicated-admins Role-based access control (RBAC) policy cannot be created Copy linkLink copied to clipboard!
Problem
The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed.error message.
Resolution
- Contact Red Hat support.
10.8. The ODH parameter secret does not get created Copy linkLink copied to clipboard!
Problem
An issue with the OpenShift AI Operator’s flow could result in failure to create the ODH parameter.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>pod that shows an error in the Status column.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Addon managed odh parameter secret does not exist.error message.
Resolution
- Contact Red Hat support.
Chapter 11. Uninstalling Red Hat OpenShift AI Self-Managed Copy linkLink copied to clipboard!
This section shows how to use the OpenShift CLI (oc) to uninstall the Red Hat OpenShift AI Operator and any OpenShift AI components installed and managed by the Operator.
Using the OpenShift CLI (oc) is the recommended way to uninstall the Operator. Depending on your version of OpenShift, using the web console to perform the uninstallation might not prompt you to uninstall all associated components. This could leave you unclear about the final state of your cluster.
11.1. Understanding the uninstallation process Copy linkLink copied to clipboard!
Installing Red Hat OpenShift AI created several custom resource instances on your OpenShift cluster for various components of OpenShift AI. After installation, users likely created several additional resources while using OpenShift AI. Uninstalling OpenShift AI removes the resources that were created by the Operator, but retains the resources created by users to prevent inadvertently deleting information you might want.
What is deleted
Uninstalling OpenShift AI removes the following resources from your OpenShift cluster:
-
DataScienceClustercustom resource instance and the custom resource instances it created for each component -
DSCInitializationcustom resource instance -
Authcustom resource instance created during or after installation -
FeatureTrackercustom resource instances created during or after installation -
ServiceMeshcustom resource instance created by the Operator during or after installation -
KNativeServingcustom resource instance created by the Operator during or after installation -
redhat-ods-applications,redhat-ods-monitoring, andrhods-notebooksnamespaces created by the Operator -
Workloads in the
rhods-notebooksnamespace -
Subscription,ClusterServiceVersion, andInstallPlanobjects -
KfDefobject (version 1 Operator only)
What might remain
Uninstalling OpenShift AI retains the following resources in your OpenShift cluster:
- Projects created by users
- Custom resource instances created by users
- Custom resource definitions (CRDs) created by users or by the Operator
While these resources might still remain in your OpenShift cluster, they are not functional. After uninstalling, Red Hat recommends that you review the projects and custom resources in your OpenShift cluster and delete anything no longer in use to prevent potential issues, such as pipelines that cannot run, notebooks that cannot be undeployed, or models that cannot be undeployed.
Additional resources
11.2. Uninstalling OpenShift AI Self-Managed by using the CLI Copy linkLink copied to clipboard!
The following procedure shows how to use the OpenShift CLI (oc) to uninstall the Red Hat OpenShift AI Operator and any OpenShift AI components installed and managed by the Operator.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have backed up the persistent disks or volumes used by your persistent volume claims (PVCs).
Procedure
- Open a new terminal window.
Log in to your OpenShift cluster as a cluster administrator, as shown in the following example:
oc login <openshift_cluster_url> -u system:admin
$ oc login <openshift_cluster_url> -u system:adminCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
ConfigMapobject for deletion of the Red Hat OpenShift AI Operator.oc create configmap delete-self-managed-odh -n redhat-ods-operator
$ oc create configmap delete-self-managed-odh -n redhat-ods-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow To delete the
rhods-operator, set theaddon-managed-odh-deletelabel totrue.oc label configmap/delete-self-managed-odh api.openshift.com/addon-managed-odh-delete=true -n redhat-ods-operator
$ oc label configmap/delete-self-managed-odh api.openshift.com/addon-managed-odh-delete=true -n redhat-ods-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow When all objects associated with the Operator are removed, delete the
redhat-ods-operatorproject.Set an environment variable for the
redhat-ods-applicationsproject.PROJECT_NAME=redhat-ods-applications
$ PROJECT_NAME=redhat-ods-applicationsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until the
redhat-ods-applicationsproject has been deleted.while oc get project $PROJECT_NAME &> /dev/null; do echo "The $PROJECT_NAME project still exists" sleep 1 done echo "The $PROJECT_NAME project no longer exists"
while oc get project $PROJECT_NAME &> /dev/null; do echo "The $PROJECT_NAME project still exists" sleep 1 done echo "The $PROJECT_NAME project no longer exists"Copy to Clipboard Copied! Toggle word wrap Toggle overflow When the
redhat-ods-applicationsproject has been deleted, you see the following output.The redhat-ods-applications project no longer exists
The redhat-ods-applications project no longer existsCopy to Clipboard Copied! Toggle word wrap Toggle overflow When the
redhat-ods-applicationsproject has been deleted, delete theredhat-ods-operatorproject.oc delete namespace redhat-ods-operator
$ oc delete namespace redhat-ods-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Confirm that the
rhods-operatorsubscription no longer exists.oc get subscriptions --all-namespaces | grep rhods-operator
$ oc get subscriptions --all-namespaces | grep rhods-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the following projects no longer exist.
-
redhat-ods-applications -
redhat-ods-monitoring -
redhat-ods-operator rhods-notebooksoc get namespaces | grep -e redhat-ods* -e rhods*
$ oc get namespaces | grep -e redhat-ods* -e rhods*Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
rhods-notebooksproject existed only if you installed the workbenches component of OpenShift AI. See Installing and managing Red Hat OpenShift AI components.
-