Installing and uninstalling OpenShift AI Self-Managed
Install and uninstall OpenShift AI Self-Managed
Abstract
Preface
Learn how to use both the OpenShift command-line interface and web console to install Red Hat OpenShift AI Self-Managed on your OpenShift cluster. To uninstall the product, learn how to use the recommended command-line interface (CLI) method.
Red Hat recommends that you install only one instance of OpenShift AI on your cluster.
Installing the Red Hat OpenShift AI Operator on the same cluster as the Red Hat OpenShift AI Add-on is not recommended or supported.
Chapter 1. Architecture of OpenShift AI Self-Managed
Red Hat OpenShift AI Self-Managed is an Operator that is available in a self-managed environment, such as Red Hat OpenShift Container Platform, or in Red Hat-managed cloud environments such as Red Hat OpenShift Dedicated (with a Customer Cloud Subscription for AWS or GCP), Red Hat OpenShift Service on Amazon Web Services (ROSA Classic or ROSA HCP), or Microsoft Azure Red Hat OpenShift.
OpenShift AI integrates the following components and services:
At the service layer:
- OpenShift AI dashboard
- A customer-facing dashboard that shows available and installed applications for the OpenShift AI environment as well as learning resources such as tutorials, quick starts, and documentation. Administrative users can access functionality to manage users, clusters, notebook images, accelerator profiles, and model-serving runtimes. Data scientists can use the dashboard to create projects to organize their data science work.
- Model serving
- Data scientists can deploy trained machine-learning models to serve intelligent applications in production. After deployment, applications can send requests to the model using its deployed API endpoint.
- Data science pipelines
- Data scientists can build portable machine learning (ML) workflows with data science pipelines 2.0, using Docker containers. With data science pipelines, data scientists can automate workflows as they develop their data science models.
- Jupyter (self-managed)
- A self-managed application that allows data scientists to configure their own notebook server environment and develop machine learning models in JupyterLab.
- Distributed workloads
- Data scientists can use multiple nodes in parallel to train machine-learning models or process data more quickly. This approach significantly reduces the task completion time, and enables the use of larger datasets and more complex models.
At the management layer:
- The Red Hat OpenShift AI Operator
- A meta-operator that deploys and maintains all components and sub-operators that are part of OpenShift AI.
- Monitoring services
- Prometheus gathers metrics from OpenShift AI for monitoring purposes.
When you install the Red Hat OpenShift AI Operator in the OpenShift cluster, the following new projects are created:
-
The
redhat-ods-operator
project contains the Red Hat OpenShift AI Operator. -
The
redhat-ods-applications
project installs the dashboard and other required components of OpenShift AI. -
The
redhat-ods-monitoring
project contains services for monitoring. -
The
rhods-notebooks
project is where notebook environments are deployed by default.
You or your data scientists must create additional projects for the applications that will use your machine learning models.
Do not install independent software vendor (ISV) applications in namespaces associated with OpenShift AI.
Chapter 2. Understanding update channels
You can use update channels to specify which Red Hat OpenShift AI minor version you intend to update your Operator to. Update channels also allow you to choose the timing and level of support your updates have through the fast
, stable
, stable-x.y
eus-x.y
, and alpha
channel options.
The subscription of an installed Operator specifies the update channel, which is used to track and receive updates for the Operator. You can change the update channel to start tracking and receiving updates from a newer channel. For more information about the release frequency and the lifecycle associated with each of the available update channels, see Red Hat OpenShift AI Self-Managed Life Cycle.
Channel | Support | Release frequency | Recommended environment |
---|---|---|---|
| One month of full support | Every month |
Production environments with access to the latest product features. Select this streaming channel with automatic updates to avoid manually upgrading every month. |
| Three months of full support | Every three months |
Production environments with stability prioritized over new feature availability. Select this streaming channel with automatic updates to access the latest stable release and avoid manually upgrading. |
| Seven months of full support | Every three months |
Production environments with stability prioritized over new feature availability.
Select numbered stable channels (such as |
| Seven months of full support followed by Extended Update Support for eleven months | Every nine months |
Enterprise-grade environments that cannot upgrade within a seven month window. Select this streaming channel if you prioritize stability over new feature availability. |
| One month of full support | Every month |
Development environments with early-access features that might not be functionally complete.
Select this channel to use early-access features to test functionality and provide feedback during the development process. Early-access features are not supported with Red Hat production service level agreements (SLAs). For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope. For more information about the support scope of Red Hat Developer Preview features, see Developer Preview Features Support Scope. |
The embedded
and beta
channels are legacy channels that will be removed in a future release. Do not select the embedded
or beta
channels for a new Operator installation.
Chapter 3. Installing and deploying OpenShift AI
Red Hat OpenShift AI is a platform for data scientists and developers of artificial intelligence (AI) applications. It provides a fully supported environment that lets you rapidly develop, train, test, and deploy machine learning models on-premises and/or in the public cloud.
OpenShift AI is provided as a managed cloud service add-on for Red Hat OpenShift or as self-managed software that you can install on-premise or in the public cloud on OpenShift.
For information about installing OpenShift AI as self-managed software on your OpenShift cluster in a disconnected environment, see Installing and uninstalling OpenShift AI Self-Managed in a disconnected environment. For information about installing OpenShift AI as a managed cloud service add-on, see Installing and uninstalling OpenShift AI.
Installing OpenShift AI involves the following high-level tasks:
- Confirm that your OpenShift cluster meets all requirements. See Requirements for OpenShift AI Self-Managed.
- Add administrative users for OpenShift. See Adding administrative users in OpenShift.
- Install the Red Hat OpenShift AI Operator. See Installing the Red Hat OpenShift AI Operator.
- Install OpenShift AI components. See Installing and managing Red Hat OpenShift AI components.
- Configure user and administrator groups to provide user access to OpenShift AI. See Adding users.
- Access the OpenShift AI dashboard. See Accessing the OpenShift AI dashboard.
- Optionally, enable graphics processing units (GPUs) in OpenShift AI to ensure that your data scientists can use compute-heavy workloads in their models. See Enabling GPU support in OpenShift AI.
3.1. Requirements for OpenShift AI Self-Managed
You must meet the following requirements before you can install Red Hat OpenShift AI on your Red Hat OpenShift cluster:
Product subscriptions
- You must have a subscription for Red Hat OpenShift AI Self-Managed.
If you want to install OpenShift AI Self-Managed in a Red Hat-managed cloud environment, you must have a subscription for one of the following platforms:
- Red Hat OpenShift Dedicated on Amazon Web Services (AWS) or Google Cloud Platform (GCP)
- Red Hat OpenShift Service on Amazon Web Services (ROSA Classic)
- Red Hat OpenShift Service on Amazon Web Services with hosted control planes (ROSA HCP)
- Microsoft Azure Red Hat OpenShift
Contact your Red Hat account manager to purchase new subscriptions. If you do not yet have an account manager, complete the form at https://www.redhat.com/en/contact to request one.
Cluster administrator access to your OpenShift cluster
You must have an OpenShift cluster with cluster administrator access. Use an existing cluster, or create a cluster by following the steps in the relevant documentation:
- OpenShift Container Platform 4.12 or later: OpenShift Container Platform installation overview
- OpenShift Dedicated: Creating an OpenShift Dedicated cluster
- ROSA Classic: cdInstall ROSA Classic clusters
- ROSA HCP: Install ROSA with HCP clusters
- Your cluster must have at least 2 worker nodes with at least 8 CPUs and 32 GiB RAM available for OpenShift AI to use when you install the Operator. To ensure that OpenShift AI is usable, additional cluster resources are required beyond the minimum requirements.
Your cluster is configured with a default storage class that can be dynamically provisioned.
Confirm that a default storage class is configured by running the
oc get storageclass
command. If no storage classes are noted with(default)
beside the name, follow the OpenShift Container Platform documentation to configure a default storage class: Changing the default storage class. For more information about dynamic provisioning, see Dynamic provisioning.- Open Data Hub must not be installed on the cluster.
For more information about managing the machines that make up an OpenShift cluster, see Overview of machine management.
An identity provider configured for OpenShift
- Red Hat OpenShift AI uses the same authentication systems as Red Hat OpenShift Container Platform. See Understanding identity provider configuration for more information on configuring identity providers.
-
Access to the cluster as a user with the
cluster-admin
role; thekubeadmin
user is not allowed.
Internet access
Along with Internet access, the following domains must be accessible during the installation of OpenShift AI Self-Managed:
-
cdn.redhat.com
-
subscription.rhn.redhat.com
-
registry.access.redhat.com
-
registry.redhat.io
-
quay.io
-
For CUDA-based images, the following domains must be accessible:
-
ngc.download.nvidia.cn
-
developer.download.nvidia.com
-
Data science pipelines preparation
- Data science pipelines 2.0 contains an installation of Argo Workflows. If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install OpenShift AI. Before installing OpenShift AI, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by data science pipelines, or remove the separate installation of Argo Workflows from your cluster.
- You can store your pipeline artifacts in an S3-compatible object storage bucket so that you do not consume local storage. To do this, you must first configure write access to your S3 bucket on your storage account.
Install KServe dependencies
- To support the KServe component, which is used by the single-model serving platform to serve large models, you must also install Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh and perform additional configuration. For more information, see Serving large models.
-
If you want to add an authorization provider for the single-model serving platform, you must install the
Red Hat - Authorino
Operator. For information, see Adding an authorization provider for the single-model serving platform.
Access to object storage
- Components of OpenShift AI require or can use S3-compatible object storage such as AWS S3, MinIO, Ceph, or IBM Cloud Storage. An object store is a data storage mechanism that enables users to access their data either as an object or as a file. The S3 API is the recognized standard for HTTP-based access to object storage services.
Object storage is required for the following components:
- Single- or multi-model serving platforms, to deploy stored models. See Deploying models on the single-model serving platform or Deploying a model by using the multi-model serving platform.
- Data science pipelines, to store artifacts, logs, and intermediate results. See Configuring a pipeline server and About pipeline logs.
Object storage can be used by the following components:
- Workbenches, to access large datasets. See Adding a data connection to your data science project.
- Distributed workloads, to pull input data from and push results to. See Running distributed data science workloads from data science pipelines.
- Code executed inside a pipeline. For example, to store the resulting model in object storage. See Overview of pipelines in Jupyterlab.
3.2. Adding administrative users in OpenShift
Before you can install and configure OpenShift AI for your data scientist users, you must define administrative users. Only users with cluster administrator privileges can install and configure OpenShift AI.
You can create a cluster admin by following the steps in the relevant documentation:
- OpenShift Container Platform: Creating a cluster admin
- OpenShift Dedicated: Managing OpenShift Dedicated administrators
- ROSA: Creating a cluster administrator user for quick cluster access
3.3. Installing the Red Hat OpenShift AI Operator
This section shows how to install the Red Hat OpenShift AI Operator on your OpenShift cluster using the command-line interface (CLI) and the OpenShift web console.
If you want to upgrade from a previous version of OpenShift AI rather than performing a new installation, see Upgrading OpenShift AI.
If your OpenShift cluster uses a proxy to access the Internet, you can configure the proxy settings for the Red Hat OpenShift AI Operator. See Overriding proxy settings of an Operator for more information.
3.3.1. Installing the Red Hat OpenShift AI Operator by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to install the Red Hat OpenShift AI Operator on your OpenShift cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.12 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in the OpenShift command-line interface (CLI).
$ oc login --token=<token> --server=<openshift_cluster_url>
Create a namespace for installation of the Operator by performing the following actions:
Create a namespace YAML file, for example,
rhods-operator-namespace.yaml
.apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator 1
- 1
- Defines the required
redhat-ods-operator
namespace for installation of the Operator.
Create the namespace in your OpenShift cluster.
$ oc create -f rhods-operator-namespace.yaml
You see output similar to the following:
namespace/redhat-ods-operator created
Create an operator group for installation of the Operator by performing the following actions:
Create an
OperatorGroup
object custom resource (CR) file, for example,rhods-operator-group.yaml
.apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: rhods-operator namespace: redhat-ods-operator 1
- 1
- Defines the required
redhat-ods-operator
namespace.
Create the
OperatorGroup
object in your OpenShift cluster.$ oc create -f rhods-operator-group.yaml
You see output similar to the following:
operatorgroup.operators.coreos.com/rhods-operator created
Create a subscription for installation of the Operator by performing the following actions:
Create a
Subscription
object CR file, for example,rhods-operator-subscription.yaml
.apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: rhods-operator namespace: redhat-ods-operator 1 spec: name: rhods-operator channel: <channel> 2 source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: rhods-operator.x.y.z 3
- 1
- Defines the required
redhat-ods-operator
namespace. - 2
- Sets the update channel. You must specify a value of
fast
,stable
,stable-x.y
eus-x.y
, oralpha
. For more information, see Understanding update channels. - 3
- Optional: Sets the operator version. If you do not specify a value, the subscription defaults to the latest operator version. For more information, see Red Hat OpenShift AI Self-Managed Life Cycle.
Create the
Subscription
object in your OpenShift cluster to install the Operator.$ oc create -f rhods-operator-subscription.yaml
You see output similar to the following:
subscription.operators.coreos.com/rhods-operator created
Verification
In the OpenShift web console, click Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:
-
Installing
- installation is in progress; wait for this to change toSucceeded
. This might take several minutes. -
Succeeded
- installation is successful.
-
In the web console, click Home → Projects and confirm that the following project namespaces are visible and listed as
Active
:-
redhat-ods-applications
-
redhat-ods-monitoring
-
redhat-ods-operator
-
3.3.2. Installing the Red Hat OpenShift AI Operator by using the web console
The following procedure shows how to use the OpenShift web console to install the Red Hat OpenShift AI Operator on your cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.12 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → OperatorHub.
- On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing Red Hat OpenShift AI into the Filter by keyword box.
- Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.
- Select a Channel. For information about subscription update channels, see Understanding update channels.
- Select a Version.
- Click Install. The Install Operator page opens.
- Review or change the selected channel and version as needed.
-
For Installation mode, note that the only available value is
All namespaces on the cluster (default)
. This installation mode makes the Operator available to all namespaces in the cluster. -
For Installed Namespace, select
Operator recommended Namespace: redhat-ods-operator
. For Update approval, select one of the following update strategies:
-
Automatic
: New updates in the update channel are installed as soon as they become available. Manual
: A cluster administrator must approve any new updates before installation begins.ImportantBy default, the Red Hat OpenShift AI Operator follows a sequential update process. This means that if there are several versions between the current version and the target version, Operator Lifecycle Manager (OLM) upgrades the Operator to each of the intermediate versions before it upgrades it to the final, target version.
If you configure automatic upgrades, OLM automatically upgrades the Operator to the latest available version. If you configure manual upgrades, a cluster administrator must manually approve each sequential update between the current version and the final, target version.
For information about supported versions, see Red Hat OpenShift AI Life Cycle.
-
Click Install.
The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.
Verification
In the OpenShift web console, click Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:
-
Installing
- installation is in progress; wait for this to change toSucceeded
. This might take several minutes. -
Succeeded
- installation is successful.
-
In the web console, click Home → Projects and confirm that the following project namespaces are visible and listed as
Active
:-
redhat-ods-applications
-
redhat-ods-monitoring
-
redhat-ods-operator
-
3.4. Installing and managing Red Hat OpenShift AI components
The following procedures show how to use the command-line interface (CLI) and OpenShift web console to install and manage components of Red Hat OpenShift AI on your OpenShift cluster.
3.4.1. Installing Red Hat OpenShift AI components by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to install specific components of Red Hat OpenShift AI on your OpenShift cluster.
The following procedure describes how to create and configure a DataScienceCluster
object to install Red Hat OpenShift AI components as part of a new installation. However, if you upgraded from version 1 of OpenShift AI (previously OpenShift Data Science), the upgrade process automatically created a default DataScienceCluster
object. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object. To inspect the default DataScienceCluster
object and change the installation status of Red Hat OpenShift AI components, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in the OpenShift command-line interface (CLI).
$ oc login --token=<token> --server=<openshift_cluster_url>
Create a
DataScienceCluster
object custom resource (CR) file, for example,rhods-operator-dsc.yaml
.apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Removed datasciencepipelines: managementState: Removed kserve: managementState: Removed 1 2 kueue: managementState: Removed modelmeshserving: managementState: Removed ray: managementState: Removed workbenches: managementState: Removed
- 1
- To fully install the KServe component, which is used by the single-model serving platform to serve large models, you must install Operators for Red Hat OpenShift Service Mesh and Red Hat OpenShift Serverless and perform additional configuration. See Serving large models.
- 2
- If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies.
In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to fully install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
- To learn how to configure the distributed workloads feature that uses the CodeFlare, KubeRay, and Kueue components, see Configuring distributed workloads.
Create the
DataScienceCluster
object in your OpenShift cluster to install the specified OpenShift AI components.$ oc create -f rhods-operator-dsc.yaml
You see output similar to the following:
datasciencecluster.datasciencecluster.opendatahub.io/default created
Verification
Confirm that there is a running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
3.4.2. Installing Red Hat OpenShift AI components by using the web console
The following procedure shows how to use the OpenShift web console to install specific components of Red Hat OpenShift AI on your cluster.
The following procedure describes how to create and configure a DataScienceCluster
object to install Red Hat OpenShift AI components as part of a new installation. However, if you upgraded from version 1 of OpenShift AI (previously OpenShift Data Science), the upgrade process automatically created a default DataScienceCluster
object. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object. To inspect the DataScienceCluster
object and change the installation status of Red Hat OpenShift AI components, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
Create a
DataScienceCluster
object to install OpenShift AI components by performing the following actions:- Click the Data Science Cluster tab.
- Click Create DataScienceCluster.
For Configure via, select YAML view.
An embedded YAML editor opens showing a default custom resource (CR) for the
DataScienceCluster
object.In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
-
If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies. - To learn how to configure the distributed workloads feature that uses the CodeFlare and KubeRay components, see Configuring distributed workloads.
- Click Create.
Verification
Confirm that there is a running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
3.4.3. Updating the installation status of Red Hat OpenShift AI components by using the web console
The following procedure shows how to use the OpenShift web console to update the installation status of components of Red Hat OpenShift AI on your OpenShift cluster.
If you upgraded from version 1 to version 2 of OpenShift AI, the upgrade process automatically created a default DataScienceCluster
object and enabled several components of OpenShift AI. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object.
The following procedure describes how to edit the DataScienceCluster
object to do the following:
- Change the installation status of the existing Red Hat OpenShift AI components
-
Add additional components to the
DataScienceCluster
object that were not available in the previous version of OpenShift AI.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the Data Science Cluster tab.
-
On the DataScienceClusters page, click the
default
object. Click the YAML tab.
An embedded YAML editor opens showing the custom resource (CR) file for the
DataScienceCluster
object.In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
-
If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies.
-
If they are not already present in the CR file, you can install the CodeFlare, KubeRay, and Kueue components by adding the
codeflare
,ray
, andkueue
entries to thespec.components
section of the CR and setting themanagementState
field for the components toManaged
. - To learn how to configure the distributed workloads feature that uses the CodeFlare, KubeRay, and Kueue components, see Configuring distributed workloads.
Click Save.
For any components that you updated, OpenShift AI initiates a rollout that affects all pods to use the updated image.
Verification
Confirm that there is a running pod for each component:
- In the OpenShift web console, click Workloads → Pods.
-
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
Confirm the status of all installed components:
- In the OpenShift web console, click Operators → Installed Operators.
- Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
3.4.4. Disabling KServe dependencies
If you have not enabled the KServe component (that is, you set the value of the managementState
field to Removed
), you must also disable the dependent Service Mesh component to avoid errors.
Prerequisites
- You have used the OpenShift command-line interface (CLI) or web console to disable the KServe component.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
Disable the OpenShift Service Mesh component as follows:
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, add theserviceMesh
component (if it is not already present) and configure themanagementState
field as shown:spec: serviceMesh: managementState: Removed
- Click Save.
Verification
In the web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
The Operator details page opens.
- In the Conditions section, confirm that there is no ReconcileComplete condition with a status value of Unknown.
Chapter 4. Preparing OpenShift AI for use in IBM Cloud Pak for Data
The procedures in this section show how to prepare Red Hat OpenShift AI for use in IBM Cloud Pak for Data version 5.0.3 or greater. These versions of Cloud Pak for Data include watsonx.ai.
The procedures in this section apply when you want to prepare a new installation of Red Hat OpenShift AI for use in IBM Cloud Pak for Data version 5.0.3 or greater. The procedures show how to install the KServe component of OpenShift AI in raw mode, which means that KServe does not have any other components as dependencies. However, if you have an existing deployment of OpenShift AI with KServe and its dependencies already installed and enabled, you do not need to modify the configuration for use in Cloud Pak for Data.
4.1. Installing the Red Hat OpenShift AI Operator by using the CLI
Follow this procedure only if you are preparing Red Hat OpenShift AI for use in IBM Cloud Pak for Data version 5.0.3 or greater. These versions of Cloud Pak for Data include watsonx.ai. If this use case does not apply to your organization, see Installing and deploying OpenShift AI for more generally applicable instructions.
This procedure shows how to use the OpenShift command-line interface (CLI) to install the Red Hat OpenShift AI Operator on your OpenShift cluster. You must install the Operator before you can manage the installation of OpenShift AI components.
Prerequisites
- You have a running OpenShift cluster, version 4.12 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
In the OpenShift command-line interface (CLI), log in to your OpenShift cluster as a cluster administrator, as shown in the following example:
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>
Create a new YAML file with the following contents:
apiVersion: v1 kind: Namespace 1 metadata: name: redhat-ods-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup 2 metadata: name: rhods-operator namespace: redhat-ods-operator --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription 3 metadata: name: rhods-operator namespace: redhat-ods-operator spec: name: rhods-operator channel: embedded 4 source: redhat-operators sourceNamespace: openshift-marketplace config: env: - name: "DISABLE_DSC_CONFIG" 5
- 1
- Defines the required
redhat-ods-operator
namespace for installation of the Operator. - 2
- Defines an Operator group for installation of the Operator. You must specify the same namespace that you defined earlier in the file. Also, another Operator group must not exist in the namespace.
- 3
- Defines a subscription for the Operator. You must specify the same namespace that you defined earlier in the file.
- 4
- Sets the update channel. You must specify the
embedded
update channel. - 5
- Specifies that DSCI and DSC custom resources are not dynamically created. This setting is required to install the KServe component of OpenShift AI in raw mode, which means that KServe does not have any other components as dependencies.
Deploy the YAML file to create the namespace, Operator group, and subscription that you defined.
$ oc create -f <file_name>.yaml
You see output similar to the following:
namespace/redhat-ods-operator created operatorgroup.operators.coreos.com/rhods-operator created subscription.operators.coreos.com/rhods-operator created
Create another new YAML file with the following contents:
apiVersion: dscinitialization.opendatahub.io/v1 kind: DSCInitialization 1 metadata: name: default-dsci spec: applicationsNamespace: redhat-ods-applications monitoring: managementState: Managed namespace: redhat-ods-monitoring serviceMesh: managementState: Removed 2 trustedCABundle: managementState: Managed customCABundle: ""
- 1
- Defines a
DSCInitialization
object calleddefault-dsci
. TheDSCInitialization
object is used by the Operator to manage resources that are common to all product components. - 2
- Specifies that the
serviceMesh
component (which is a dependency for KServe in some setups) is not installed. This setting is required when preparing OpenShift AI for use in IBM products.
Create the
DSCInitialization
object on your OpenShift cluster.$ oc create -f <dsci_file_name>.yaml
Verification
In the OpenShift CLI, check that there is a running pod for the Operator by performing the following actions:
Check the pods in the
redhat-ods-operator
project.$ oc get pods -n redhat-ods-operator
Confirm that there is a
rhods-operator-*
pod with aRunning
status, as shown in the following example:NAME READY STATUS RESTARTS AGE rhods-operator-56c85d44c9-vtk74 1/1 Running 0 3h57m
In the OpenShift CLI, check that the
DSCInitialization
object that you created is running by performing the following actions:Check the cluster for
DSCInitialization
objects.$ oc get dscinitialization
Confirm that there is a
default-dsci
object with aReady
status, as shown in the following example:NAME AGE PHASE default-dsci 4d18h Ready
Additional resources
4.2. Managing Red Hat OpenShift AI components by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to manage the installation of specific components of Red Hat OpenShift AI on your OpenShift cluster.
Follow this procedure only if you are preparing Red Hat OpenShift AI for use in IBM Cloud Pak for Data version 5.0.3 or greater. These versions of Cloud Pak for Data include watsonx.ai. If this use case does not apply to your organization, see Installing and managing Red Hat OpenShift AI components for more generally applicable instructions.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator by using the CLI.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
In the OpenShift command-line interface (CLI), log in to your OpenShift cluster as a cluster administrator, as shown in the following example:
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>
Create a new YAML file with the following contents:
apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster 1 metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Removed datasciencepipelines: managementState: Removed kserve: managementState: Managed defaultDeploymentMode: RawDeployment 2 serving: managementState: Removed 3 name: knative-serving kueue: managementState: Removed modelmeshserving: managementState: Removed ray: managementState: Removed trainingoperator: managementState: Managed 4 trustyai: managementState: Removed workbenches: managementState: Removed
- 1
- Defines a new
DataScienceCluster
object calleddefault-dsc
. TheDataScienceCluster
object is used by the Operator to manage OpenShift AI components. - 2
- Specifies that the KServe component is installed and managed by the Operator in raw mode, which means that KServe does not have any other components as dependencies.
- 3
- Specifies that the
serving
component (which is a dependency for KServe in some setups) is not installed. This setting is required when preparing OpenShift AI for use in Cloud Pak for Data. - 4
- Specifies that the Training Operator is installed and managed by the Operator. This component is required if you want to use the Kubeflow Training Operator to tune models.
In addition, observe that the value of the
managementState
field for all other OpenShift AI components is set toRemoved
. This value means that the components are not installed.In general, the
Managed
andRemoved
values are described as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Create the
DataScienceCluster
object on your OpenShift cluster.$ oc create -f <file_name>.yaml
You see output similar to the following:
datasciencecluster.datasciencecluster.opendatahub.io/default-dsc created
Verification
In the OpenShift CLI, check that there are running pods for the KServe controller and Kubeflow Training Operator by performing the following actions:
Check the pods in the
redhat-ods-applications
project.$ oc get pods -n redhat-ods-applications
Confirm that there is a
kserve-controller-manager-*
and akubeflow-training-operator-*
pod with aRunning
status, similar to the following example:NAME READY STATUS RESTARTS AGE kserve-controller-manager-57796d5b44-sh9n5 1/1 Running 0 4m57s kubeflow-training-operator-7b99d5584c-rh5hb 1/1 Running 0 4m57s
4.3. Editing the model inferencing configuration
Particular use cases in IBM Cloud Pak for Data version 5.0.3 or greater (which include watsonx.ai) might require customizations to the model inferencing configuration used by Red Hat OpenShift AI. Before you can make any such customizations, you need to put the inferencing configuration file in an editable state. In addition, you must make a specific configuration update that prevents errors when using OpenShift AI in Cloud Pak for Data and watsonx.ai.
Follow this procedure only if you are preparing Red Hat OpenShift AI for use in IBM Cloud Pak for Data version 5.0.3 or greater. These versions of Cloud Pak for Data include watsonx.ai.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator by using the CLI.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
- In the web console, click Workloads → ConfigMaps.
- In the Project list, click redhat-ods-applications.
-
On the list of
ConfigMap
resources, click the inferenceservice-config resource and then click the YAML tab. In the
metadata.annotations
section of the file, addopendatahub.io/managed: 'false'
as shown:metadata: annotations: internal.config.kubernetes.io/previousKinds: ConfigMap internal.config.kubernetes.io/previousNames: inferenceservice-config internal.config.kubernetes.io/previousNamespaces: opendatahub opendatahub.io/managed: 'false'
The additional annotation makes the inferencing configuration file editable.
To prevent errors when using OpenShift AI in Cloud Pak for Data (including watsonx.ai), update the configuration as follows:
In the YAML file, locate the following entry:
"domainTemplate": "{{ .Name }}-{{ .Namespace }}.{{ .IngressDomain }}
Update the value of the
domainTemplate
field as shown:"domainTemplate": "example.com"
This new, explicitly specified value ensures that OpenShift AI cannot generate a value for the
domainTemplate
field that exceeds the maximum length and causes an error. ThedomainTemplate
field is not used by the raw deployment mode that you configured for the KServe component when preparing OpenShift AI for use in Cloud Pak for Data.
- Click Save.
Chapter 5. Accessing the dashboard
After you have installed OpenShift AI and added users, you can access the URL for your OpenShift AI console and share the URL with the users to let them log in and work on their models.
Prerequisites
- You have installed OpenShift AI on your OpenShift cluster.
- You have added at least one user to the user group for OpenShift AI.
Procedure
- Log in to OpenShift web console.
- Click the application launcher ( ).
- Right-click on Red Hat OpenShift AI and copy the URL for your OpenShift AI instance.
- Provide this instance URL to your data scientists to let them log in to OpenShift AI.
Verification
- Confirm that you and your users can log in to OpenShift AI by using the instance URL.
Additional resources
Chapter 6. Enabling GPU support in OpenShift AI
Optionally, to ensure that your data scientists can use compute-heavy workloads in their models, you can enable graphics processing units (GPUs) in OpenShift AI.
If you are using OpenShift AI in a disconnected self-managed environment, see Enabling GPU support in OpenShift AI instead.
Prerequisites
- You have logged in to your OpenShift cluster.
-
You have the
cluster-admin
role in your OpenShift cluster.
Procedure
- To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.
Delete the migration-gpu-status ConfigMap.
- In the OpenShift web console, switch to the Administrator perspective.
- Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
- Search for the migration-gpu-status ConfigMap.
Click the action menu (⋮) and select Delete ConfigMap from the list.
The Delete ConfigMap dialog appears.
- Inspect the dialog and confirm that you are deleting the correct ConfigMap.
- Click Delete.
Restart the dashboard replicaset.
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Deployments.
- Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
- Search for the rhods-dashboard deployment.
- Click the action menu (⋮) and select Restart Rollout from the list.
- Wait until the Status column indicates that all pods in the rollout have fully restarted.
Verification
- The NVIDIA GPU Operator appears on the Operators → Installed Operators page in the OpenShift web console.
-
The reset migration-gpu-status instance is present in the Instances tab on the
AcceleratorProfile
custom resource definition (CRD) details page.
After installing the NVIDIA GPU Operator, create an accelerator profile as described in Working with accelerator profiles.
Chapter 7. Working with certificates
Certificates are used by various components in OpenShift to validate access to the cluster. For clusters that rely on self-signed certificates, you can add those self-signed certificates to a cluster-wide Certificate Authority (CA) bundle and use the CA bundle in Red Hat OpenShift AI. You can also use self-signed certificates in a custom CA bundle that is separate from the cluster-wide bundle. Administrators can add a CA bundle, remove a CA bundle from all namespaces, remove a CA bundle from individual namespaces, or manually manage certificate changes instead of the system.
7.1. Understanding certificates in OpenShift AI
For OpenShift clusters that rely on self-signed certificates, you can add those self-signed certificates to a cluster-wide Certificate Authority (CA) bundle (ca-bundle.crt
) and use the CA bundle in Red Hat OpenShift AI. You can also use self-signed certificates in a custom CA bundle (odh-ca-bundle.crt
) that is separate from the cluster-wide bundle.
7.1.1. How CA bundles are injected
After installing OpenShift AI, the Red Hat OpenShift AI Operator automatically creates an empty odh-trusted-ca-bundle
configuration file (ConfigMap), and the Cluster Network Operator (CNO) injects the cluster-wide CA bundle into the odh-trusted-ca-bundle
configMap with the label "config.openshift.io/inject-trusted-cabundle". The components deployed in the affected namespaces are responsible for mounting this configMap as a volume in the deployment pods.
apiVersion: v1 kind: ConfigMap metadata: labels: app.kubernetes.io/part-of: opendatahub-operator config.openshift.io/inject-trusted-cabundle: 'true' name: odh-trusted-ca-bundle
After the CNO operator injects the bundle, it updates the ConfigMap with the ca-bundle.crt
file containing the certificates.
apiVersion: v1 kind: ConfigMap metadata: labels: app.kubernetes.io/part-of: opendatahub-operator config.openshift.io/inject-trusted-cabundle: 'true' name: odh-trusted-ca-bundle data: ca-bundle.crt: | <BUNDLE OF CLUSTER-WIDE CERTIFICATES>
7.1.2. How the ConfigMap is managed
By default, the Red Hat OpenShift AI Operator manages the odh-trusted-ca-bundle
ConfigMap. If you want to manage or remove the odh-trusted-ca-bundle
ConfigMap, or add a custom CA bundle (odh-ca-bundle.crt
) separate from the cluster-wide CA bundle (ca-bundle.crt
), you can use the trustedCABundle
property in the Operator’s DSC Initialization (DSCI) object.
spec: trustedCABundle: managementState: Managed customCABundle: ""
In the Operator’s DSCI object, you can set the spec.trustedCABundle.managementState
field to the following values:
-
Managed
: The Red Hat OpenShift AI Operator manages theodh-trusted-ca-bundle
ConfigMap and adds it to all non-reserved existing and new namespaces (the ConfigMap is not added to any reserved or system namespaces, such asdefault
,openshift-\*
orkube-*
). The ConfigMap is automatically updated to reflect any changes made to thecustomCABundle
field. This is the default value after installing Red Hat OpenShift AI. -
Removed
: The Red Hat OpenShift AI Operator removes theodh-trusted-ca-bundle
ConfigMap (if present) and disables the creation of the ConfigMap in new namespaces. If you change this field fromManaged
toRemoved
, theodh-trusted-ca-bundle
ConfigMap is also deleted from namespaces. This is the default value after upgrading Red Hat OpenShift AI from 2.7 or earlier versions to 2.11. -
Unmanaged
: The Red Hat OpenShift AI Operator does not manage theodh-trusted-ca-bundle
ConfigMap, allowing for an administrator to manage it instead. Changing themanagementState
fromManaged
toUnmanaged
does not remove theodh-trusted-ca-bundle
ConfigMap, but the ConfigMap is not updated if you make changes to thecustomCABundle
field.
In the Operator’s DSCI object, you can add a custom certificate to the spec.trustedCABundle.customCABundle
field. This adds the odh-ca-bundle.crt
file containing the certificates to the odh-trusted-ca-bundle
ConfigMap, as shown in the following example:
apiVersion: v1 kind: ConfigMap metadata: labels: app.kubernetes.io/part-of: opendatahub-operator config.openshift.io/inject-trusted-cabundle: 'true' name: odh-trusted-ca-bundle data: ca-bundle.crt: | <BUNDLE OF CLUSTER-WIDE CERTIFICATES> odh-ca-bundle.crt: | <BUNDLE OF CUSTOM CERTIFICATES>
7.2. Adding a CA bundle
There are two ways to add a Certificate Authority (CA) bundle to OpenShift AI. You can use one or both of these methods:
-
For OpenShift clusters that rely on self-signed certificates, you can add those self-signed certificates to a cluster-wide Certificate Authority (CA) bundle (
ca-bundle.crt
) and use the CA bundle in Red Hat OpenShift AI. To use this method, log in to the OpenShift as a cluster administrator and follow the steps as described in Configuring the cluster-wide proxy during installation. -
You can use self-signed certificates in a custom CA bundle (
odh-ca-bundle.crt
) that is separate from the cluster-wide bundle. To use this method, follow the steps in this section.
Prerequisites
-
You have admin access to the
DSCInitialization
resources in the OpenShift cluster. -
You installed the OpenShift command line interface (
oc
) as described in Get Started with the CLI. - You are working in a new installation of Red Hat OpenShift AI. If you upgraded Red Hat OpenShift AI, see Adding a CA bundle after upgrading.
Procedure
- Log in to the OpenShift.
- Click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, add the custom certificate to thecustomCABundle
field fortrustedCABundle
, as shown in the following example:spec: trustedCABundle: managementState: Managed customCABundle: | -----BEGIN CERTIFICATE----- examplebundle123 -----END CERTIFICATE-----
- Click Save.
Verification
If you are using a cluster-wide CA bundle, run the following command to verify that all non-reserved namespaces contain the
odh-trusted-ca-bundle
ConfigMap:$ oc get configmaps --all-namespaces -l app.kubernetes.io/part-of=opendatahub-operator | grep odh-trusted-ca-bundle
If you are using a custom CA bundle, run the following command to verify that a non-reserved namespace contains the
odh-trusted-ca-bundle
ConfigMap and that the ConfigMap contains yourcustomCABundle
value. In the following command, example-namespace is the non-reserved namespace and examplebundle123 is the customCABundle value.$ oc get configmap odh-trusted-ca-bundle -n example-namespace -o yaml | grep examplebundle123
7.3. Removing a CA bundle
You can remove a Certificate Authority (CA) bundle from all non-reserved namespaces in OpenShift AI. This process changes the default configuration and disables the creation of the odh-trusted-ca-bundle
configuration file (ConfigMap), as described in Understanding certificates in OpenShift AI.
The odh-trusted-ca-bundle
ConfigMaps are only deleted from namespaces when you set the managementState
of trustedCABundle
to Removed
; deleting the DSC Initialization does not delete the ConfigMaps.
To remove a CA bundle from a single namespace only, see Removing a CA bundle from a namespace.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
-
You installed the OpenShift command line interface (
oc
) as described in Get Started with the CLI.
Procedure
- In the OpenShift web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, change the value of themanagementState
field fortrustedCABundle
toRemoved
:spec: trustedCABundle: managementState: Removed
- Click Save.
Verification
Run the following command to verify that the
odh-trusted-ca-bundle
ConfigMap has been removed from all namespaces:$ oc get configmaps --all-namespaces | grep odh-trusted-ca-bundle
The command should not return any ConfigMaps.
7.4. Removing a CA bundle from a namespace
You can remove a custom Certificate Authority (CA) bundle from individual namespaces in OpenShift AI. This process disables the creation of the odh-trusted-ca-bundle
configuration file (ConfigMap) for the specified namespace only.
To remove a certificate bundle from all namespaces, see Removing a CA bundle.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
-
You installed the OpenShift command line interface (
oc
) as described in Get Started with the CLI.
Procedure
Run the following command to remove a CA bundle from a namespace. In the following command, example-namespace is the non-reserved namespace.
$ oc annotate ns example-namespace security.opendatahub.io/inject-trusted-ca-bundle=false
Verification
Run the following command to verify that the CA bundle has been removed from the namespace. In the following command, example-namespace is the non-reserved namespace.
$ oc get configmap odh-trusted-ca-bundle -n example-namespace
The command should return
configmaps "odh-trusted-ca-bundle" not found
.
7.5. Managing certificates
After installing OpenShift AI, the Red Hat OpenShift AI Operator creates the odh-trusted-ca-bundle
configuration file (ConfigMap) that contains the trusted CA bundle and adds it to all new and existing non-reserved namespaces in the cluster. By default, the Red Hat OpenShift AI Operator manages the odh-trusted-ca-bundle
ConfigMap and automatically updates it if any changes are made to the CA bundle. You can choose to manage the odh-trusted-ca-bundle
ConfigMap instead of allowing the Red Hat OpenShift AI Operator to manage it.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- In the OpenShift web console, click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, change the value of themanagementState
field fortrustedCABundle
toUnmanaged
, as shown:spec: trustedCABundle: managementState: Unmanaged
Click Save.
Note that changing the
managementState
fromManaged
toUnmanaged
does not remove theodh-trusted-ca-bundle
ConfigMap, but the ConfigMap is not updated if you make changes to thecustomCABundle
field.
Verification
In the
spec
section, set or change the value of thecustomCABundle
field fortrustedCABundle
, for example:spec: trustedCABundle: managementState: Unmanaged customCABundle: example123
- Click Save.
- Click Workloads → ConfigMaps.
- Select a project from the project list.
-
Click the
odh-trusted-ca-bundle
ConfigMap. -
Click the YAML tab and verify that the value of the
customCABundle
field did not update.
7.6. Using self-signed certificates with OpenShift AI components
Some OpenShift AI components have additional options or required configuration for self-signed certificates.
7.6.1. Using certificates with data science pipelines
If you want to use self-signed certificates, you have added them to a central Certificate Authority (CA) bundle as described in Working with certificates (for disconnected environments, see Working with certificates).
No additional configuration is necessary to use those certificates with data science pipelines.
7.6.1.1. Providing a CA bundle only for data science pipelines
Perform the following steps to provide a Certificate Authority (CA) bundle just for data science pipelines.
Procedure
- Log in to OpenShift.
From Workloads → ConfigMaps, create a ConfigMap with the required bundle in the same data science project or namespace as the target data science pipeline:
kind: ConfigMap apiVersion: v1 metadata: name: custom-ca-bundle data: ca-bundle.crt: | # contents of ca-bundle.crt
Add the following snippet to the
.spec.apiserver.caBundle
field of the underlying Data Science Pipelines Application (DSPA):apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1 kind: DataSciencePipelinesApplication metadata: name: data-science-dspa spec: ... apiServer: ... cABundle: configMapName: custom-ca-bundle configMapKey: ca-bundle.crt
The pipeline server pod redeploys with the updated bundle and uses it in the newly created pipeline pods.
Verification
Perform the following steps to confirm that your CA bundle was successfully mounted.
- Log in to the OpenShift console.
- Go to the OpenShift project that corresponds to the data science project.
- Click the Pods tab.
-
Click the pipeline server pod with the
ds-pipeline-dspa-<hash>
prefix. - Click Terminal.
-
Enter
cat /dsp-custom-certs/dsp-ca.crt
. - Verify that your CA bundle is present within this file.
You can also confirm that your CA bundle was successfully mounted by using the CLI:
In a terminal window, log in to the OpenShift cluster where OpenShift AI is deployed.
oc login
Set the
dspa
value:dspa=dspa
Set the
dsProject
value, replacing$YOUR_DS_PROJECT
with the name of your data science project:dsProject=$YOUR_DS_PROJECT
Set the
pod
value:pod=$(oc get pod -n ${dsProject} -l app=ds-pipeline-${dspa} --no-headers | awk '{print $1}')
Display the contents of the
/dsp-custom-certs/dsp-ca.crt
file:oc -n ${dsProject} exec $pod -- cat /dsp-custom-certs/dsp-ca.crt
- Verify that your CA bundle is present within this file.
7.6.2. Using certificates with workbenches
Self-signed certificates apply to workbenches that you create after configuring self-signed certificates centrally as described in Working with certificates (for disconnected environments, see Working with certificates. There is no change to workbenches that you created before configuring self-signed certificates.
7.6.2.1. Creating data science pipelines with Elyra and self-signed certificates
To create pipelines using a workbench that contains the Elyra extension and which uses self-signed certificates, see the Workbench workaround for executing a pipeline using Elyra in a disconnected environment knowledgebase article.
Chapter 8. Configuring the OpenShift AI Operator logger
You can change the log level for OpenShift AI Operator components by setting the .spec.devFlags.logmode
flag for the DSC Initialization/DSCI
custom resource during runtime. If you do not set a logmode
value, the logger uses the INFO log level by default.
The log level that you set with .spec.devFlags.logmode
applies to all components, not just those in a Managed state.
The following table shows the available log levels:
Log level | Stacktrace level | Verbosity | Output | Timestamp type |
---|---|---|---|---|
| WARN | INFO | Console | Epoch timestamps |
| ERROR | INFO | JSON | Human-readable timestamps |
| ERROR | INFO | JSON | Human-readable timestamps |
Logs that are set to devel
or development
generate in a plain text console format. Logs that are set to prod
, production
, or which do not have a level set generate in a JSON format.
Prerequisites
-
You have admin access to the
DSCInitialization
resources in the OpenShift cluster. -
You installed the OpenShift command line interface (
oc
) as described in Get Started with the CLI.
Procedure
- Log in to the OpenShift as a cluster administrator.
- Click Operators → Installed Operators and then click the Red Hat OpenShift AI Operator.
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, update the.spec.devFlags.logmode
flag with the log level that you want to set.apiVersion: dscinitialization.opendatahub.io/v1 kind: DSCInitialization metadata: name: default-dsci spec: devFlags: logmode: development
- Click Save.
You can also configure the log level from the OpenShift CLI by using the following command with the logmode
value set to the log level that you want.
oc patch dsci default-dsci -p '{"spec":{"devFlags":{"logmode":"development"}}}' --type=merge
Verification
-
If you set the component log level to
devel
ordevelopment
, logs generate more frequently and include logs atWARN
level and above. -
If you set the component log level to
prod
orproduction
, or do not set a log level, logs generate less frequently and include logs atERROR
level or above.
8.1. Viewing the OpenShift AI Operator log
- Log in to the OpenShift CLI.
Run the following command:
oc get pods -l name=rhods-operator -o name -n redhat-ods-operator | xargs -I {} oc logs -f {} -n redhat-ods-operator
The operator pod log opens.
You can also view the operator pod log in the OpenShift Console, under Workloads > Deployments > Pods > redhat-ods-operator
> Logs.
Chapter 9. Troubleshooting common installation problems
If you are experiencing difficulties installing the Red Hat OpenShift AI Operator, read this section to understand what could be causing the problem and how to resolve it.
If the problem is not included here or in the release notes, contact Red Hat Support. When opening a support case, it is helpful to include debugging information about your cluster. You can collect this information by using the must-gather
tool as described in Must-Gather for Red Hat OpenShift AI and Gathering data about your cluster.
You can also adjust the log level of OpenShift AI Operator components to increase or reduce log verbosity to suit your use case. For more information, see Configuring the OpenShift AI Operator logger.
9.1. The Red Hat OpenShift AI Operator cannot be retrieved from the image registry
Problem
When attempting to retrieve the Red Hat OpenShift AI Operator from the image registry, an Failure to pull from quay
error message appears. The Red Hat OpenShift AI Operator might be unavailable for retrieval in the following circumstances:
- The image registry is unavailable.
- There is a problem with your network connection.
- Your cluster is not operational and is therefore unable to retrieve the image registry.
Diagnosis
Check the logs in the Events section in OpenShift for further information about the Failure to pull from quay
error message.
Resolution
- Contact Red Hat support.
9.2. OpenShift AI does not install on unsupported infrastructure
Problem
You are deploying on an environment that is not documented as supported by the Red Hat OpenShift AI Operator.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Deploying on $infrastructure, which is not supported. Failing Installation
error message.
Resolution
- Before proceeding with a new installation, ensure that you have a fully supported environment on which to install OpenShift AI. For more information, see Red Hat OpenShift AI: Supported Configurations.
9.3. The creation of the OpenShift AI Custom Resource (CR) fails
Problem
During the installation process, the OpenShift AI Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the ODH CR failed.
error message.
Resolution
- Contact Red Hat support.
9.4. The creation of the OpenShift AI Notebooks Custom Resource (CR) fails
Problem
During the installation process, the OpenShift AI Notebooks Custom Resource (CR) does not get created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RHODS Notebooks CR failed.
error message.
Resolution
- Contact Red Hat support.
9.5. The OpenShift AI dashboard is not accessible
Problem
After installing OpenShift AI, the redhat-ods-applications
, redhat-ods-monitoring
, and redhat-ods-operator
project namespaces are Active
but you cannot access the dashboard due to an error in the pod.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects.
Click Filter and select the checkbox for every status except Running and Completed.
The page displays the pods that have an error.
Resolution
- To see more information and troubleshooting steps for a pod, on the Pods page, click the link in the Status column for the pod.
- If the Status column does not display a link, click the pod name to open the pod details page and then click the Logs tab.
9.6. The dedicated-admins Role-based access control (RBAC) policy cannot be created
Problem
The Role-based access control (RBAC) policy for the dedicated-admins group in the target project cannot be created. This issue occurs in unknown circumstances.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Attempt to create the RBAC policy for dedicated admins group in $target_project failed.
error message.
Resolution
- Contact Red Hat support.
9.7. The PagerDuty secret does not get created
Problem
An issue with Managed Tenants SRE automation process causes the PagerDuty’s secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Pagerduty secret does not exist
error message.
Resolution
- Contact Red Hat support.
9.8. The SMTP secret does not exist
Problem
An issue with Managed Tenants SRE automation process causes the SMTP secret to not get created.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: SMTP secret does not exist
error message.
Resolution
- Contact Red Hat support.
9.9. The ODH parameter secret does not get created
Problem
An issue with the OpenShift AI Operator’s flow could result in failure to create the ODH parameter.
Diagnosis
- In the OpenShift web console, switch to the Administrator perspective.
- Click Workloads → Pods.
- Set the Project to All Projects or redhat-ods-operator.
Click the
rhods-operator-<random string>
pod.The Pod details page appears.
- Click Logs.
- Select rhods-operator from the drop-down list.
-
Check the log for the
ERROR: Addon managed odh parameter secret does not exist.
error message.
Resolution
- Contact Red Hat support.
9.10. Data science pipelines are not enabled after installing OpenShift AI 2.9 or later due to existing Argo Workflows resources
Problem
After installing OpenShift AI 2.9 or later with an Argo Workflows installation that is not installed by OpenShift AI on your cluster, data science pipelines are not enabled despite the datasciencepipelines
component being enabled in the DataScienceCluster
object.
Diagnosis
After you install OpenShift AI 2.9 or later, the Data Science Pipelines tab is not visible on the OpenShift AI dashboard navigation menu.
Resolution
- Delete the separate installation of Argo workflows on your cluster. After you have removed any Argo Workflows resources that are not created by OpenShift AI from your cluster, data science pipelines are enabled automatically.
Chapter 10. Uninstalling Red Hat OpenShift AI Self-Managed
This section shows how to use the OpenShift command-line interface (CLI) to uninstall the Red Hat OpenShift AI Operator and any OpenShift AI components installed and managed by the Operator.
Using the CLI is the recommended way to uninstall the Operator. Depending on your version of OpenShift, using the web console to perform the uninstallation might not prompt you to uninstall all associated components. This could leave you unclear about the final state of your cluster.
10.1. Understanding the uninstallation process
Installing Red Hat OpenShift AI created several custom resource instances on your OpenShift cluster for various components of OpenShift AI. After installation, users likely created several additional resources while using OpenShift AI. Uninstalling OpenShift AI removes the resources that were created by the Operator, but retains the resources created by users to prevent inadvertently deleting information you might want.
What is deleted
Uninstalling OpenShift AI removes the following resources from your OpenShift cluster:
-
DataScienceCluster
custom resource instance -
DSCInitialization
custom resource instance -
FeatureTracker
custom resource instances created during or after installation -
ServiceMesh
custom resource instance created by the Operator during or after installation -
KNativeServing
custom resource instance created by the Operator during or after installation -
redhat-ods-applications
,redhat-ods-monitoring
, andrhods-notebooks
namespaces created by the Operator -
Workloads in the
rhods-notebooks
namespace -
Subscription
,ClusterServiceVersion
, andInstallPlan
objects -
KfDef
object (version 1 Operator only)
What might remain
Uninstalling OpenShift AI retains the following resources in your OpenShift cluster:
- Data science projects created by users
- Custom resource instances created by users
- Custom resource definitions (CRDs) created by users or by the Operator
While these resources might still remain in your OpenShift cluster, they are not functional. After uninstalling, Red Hat recommends that you review the data science projects and custom resources in your OpenShift cluster and delete anything no longer in use to prevent potential issues, such as pipelines that cannot run, notebooks that cannot be undeployed, or models that cannot be undeployed.
Additional resources
10.2. Uninstalling OpenShift AI Self-Managed by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to uninstall the Red Hat OpenShift AI Operator and any OpenShift AI components installed and managed by the Operator.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
- You have backed up the persistent disks or volumes used by your persistent volume claims (PVCs).
Procedure
- Open a new terminal window.
In the OpenShift command-line interface (CLI), log in to your OpenShift cluster as a cluster administrator, as shown in the following example:
$ oc login <openshift_cluster_url> -u system:admin
Create a
ConfigMap
object for deletion of the Red Hat OpenShift AI Operator.$ oc create configmap delete-self-managed-odh -n redhat-ods-operator
To delete the
rhods-operator
, set theaddon-managed-odh-delete
label totrue
.$ oc label configmap/delete-self-managed-odh api.openshift.com/addon-managed-odh-delete=true -n redhat-ods-operator
When all objects associated with the Operator are removed, delete the
redhat-ods-operator
project.Set an environment variable for the
redhat-ods-applications
project.$ PROJECT_NAME=redhat-ods-applications
Wait until the
redhat-ods-applications
project has been deleted.$ while oc get project $PROJECT_NAME &> /dev/null; do echo "The $PROJECT_NAME project still exists" sleep 1 done echo "The $PROJECT_NAME project no longer exists"
When the
redhat-ods-applications
project has been deleted, you see the following output.The redhat-ods-applications project no longer exists
When the
redhat-ods-applications
project has been deleted, delete theredhat-ods-operator
project.$ oc delete namespace redhat-ods-operator
Verification
Confirm that the
rhods-operator
subscription no longer exists.$ oc get subscriptions --all-namespaces | grep rhods-operator
Confirm that the following projects no longer exist.
-
redhat-ods-applications
-
redhat-ods-monitoring
-
redhat-ods-operator
rhods-notebooks
$ oc get namespaces | grep -e redhat-ods* -e rhods*
NoteThe
rhods-notebooks
project was created only if you installed the workbenches component of OpenShift AI. See Installing and managing Red Hat OpenShift AI components.
-