Chapter 3. Installing and deploying OpenShift AI
Red Hat OpenShift AI is a platform for data scientists and developers of artificial intelligence (AI) applications. It provides a fully supported environment that lets you rapidly develop, train, test, and deploy machine learning models on-premises and/or in the public cloud.
OpenShift AI is provided as a managed cloud service add-on for Red Hat OpenShift or as self-managed software that you can install on-premise or in the public cloud on OpenShift.
For information about installing OpenShift AI as self-managed software on your OpenShift cluster in a disconnected environment, see Installing and uninstalling OpenShift AI Self-Managed in a disconnected environment. For information about installing OpenShift AI as a managed cloud service add-on, see Installing and uninstalling OpenShift AI.
Installing OpenShift AI involves the following high-level tasks:
- Confirm that your OpenShift cluster meets all requirements. See Requirements for OpenShift AI Self-Managed.
- Add administrative users for OpenShift. See Adding administrative users in OpenShift.
- Install the Red Hat OpenShift AI Operator. See Installing the Red Hat OpenShift AI Operator.
- Install OpenShift AI components. See Installing and managing Red Hat OpenShift AI components.
- Configure user and administrator groups to provide user access to OpenShift AI. See Adding users.
- Access the OpenShift AI dashboard. See Accessing the OpenShift AI dashboard.
- Optionally, enable graphics processing units (GPUs) in OpenShift AI to ensure that your data scientists can use compute-heavy workloads in their models. See Enabling GPU support in OpenShift AI.
3.1. Requirements for OpenShift AI Self-Managed
You must meet the following requirements before you can install Red Hat OpenShift AI on your Red Hat OpenShift cluster:
Product subscriptions
- You must have a subscription for Red Hat OpenShift AI Self-Managed.
If you want to install OpenShift AI Self-Managed in a Red Hat-managed cloud environment, you must have a subscription for one of the following platforms:
- Red Hat OpenShift Dedicated on Amazon Web Services (AWS) or Google Cloud Platform (GCP)
- Red Hat OpenShift Service on Amazon Web Services (ROSA Classic)
- Red Hat OpenShift Service on Amazon Web Services with hosted control planes (ROSA HCP)
- Microsoft Azure Red Hat OpenShift
Contact your Red Hat account manager to purchase new subscriptions. If you do not yet have an account manager, complete the form at https://www.redhat.com/en/contact to request one.
Cluster administrator access to your OpenShift cluster
You must have an OpenShift cluster with cluster administrator access. Use an existing cluster, or create a cluster by following the steps in the relevant documentation:
- OpenShift Container Platform 4.12 or later: OpenShift Container Platform installation overview
- OpenShift Dedicated: Creating an OpenShift Dedicated cluster
- ROSA Classic: cdInstall ROSA Classic clusters
- ROSA HCP: Install ROSA with HCP clusters
- Your cluster must have at least 2 worker nodes with at least 8 CPUs and 32 GiB RAM available for OpenShift AI to use when you install the Operator. To ensure that OpenShift AI is usable, additional cluster resources are required beyond the minimum requirements.
Your cluster is configured with a default storage class that can be dynamically provisioned.
Confirm that a default storage class is configured by running the
oc get storageclass
command. If no storage classes are noted with(default)
beside the name, follow the OpenShift Container Platform documentation to configure a default storage class: Changing the default storage class. For more information about dynamic provisioning, see Dynamic provisioning.- Open Data Hub must not be installed on the cluster.
For more information about managing the machines that make up an OpenShift cluster, see Overview of machine management.
An identity provider configured for OpenShift
- Red Hat OpenShift AI uses the same authentication systems as Red Hat OpenShift Container Platform. See Understanding identity provider configuration for more information on configuring identity providers.
-
Access to the cluster as a user with the
cluster-admin
role; thekubeadmin
user is not allowed.
Internet access
Along with Internet access, the following domains must be accessible during the installation of OpenShift AI Self-Managed:
-
cdn.redhat.com
-
subscription.rhn.redhat.com
-
registry.access.redhat.com
-
registry.redhat.io
-
quay.io
-
For CUDA-based images, the following domains must be accessible:
-
ngc.download.nvidia.cn
-
developer.download.nvidia.com
-
Data science pipelines preparation
- Data science pipelines 2.0 contains an installation of Argo Workflows. If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install OpenShift AI. Before installing OpenShift AI, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by data science pipelines, or remove the separate installation of Argo Workflows from your cluster.
- You can store your pipeline artifacts in an S3-compatible object storage bucket so that you do not consume local storage. To do this, you must first configure write access to your S3 bucket on your storage account.
Install KServe dependencies
- To support the KServe component, which is used by the single-model serving platform to serve large models, you must also install Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh and perform additional configuration. For more information, see Serving large models.
-
If you want to add an authorization provider for the single-model serving platform, you must install the
Red Hat - Authorino
Operator. For information, see Adding an authorization provider for the single-model serving platform.
Access to object storage
- Components of OpenShift AI require or can use S3-compatible object storage such as AWS S3, MinIO, Ceph, or IBM Cloud Storage. An object store is a data storage mechanism that enables users to access their data either as an object or as a file. The S3 API is the recognized standard for HTTP-based access to object storage services.
Object storage is required for the following components:
- Single- or multi-model serving platforms, to deploy stored models. See Deploying models on the single-model serving platform or Deploying a model by using the multi-model serving platform.
- Data science pipelines, to store artifacts, logs, and intermediate results. See Configuring a pipeline server and About pipeline logs.
Object storage can be used by the following components:
- Workbenches, to access large datasets. See Adding a data connection to your data science project.
- Distributed workloads, to pull input data from and push results to. See Running distributed data science workloads from data science pipelines.
- Code executed inside a pipeline. For example, to store the resulting model in object storage. See Overview of pipelines in Jupyterlab.
3.2. Adding administrative users in OpenShift
Before you can install and configure OpenShift AI for your data scientist users, you must define administrative users. Only users with cluster administrator privileges can install and configure OpenShift AI.
You can create a cluster admin by following the steps in the relevant documentation:
- OpenShift Container Platform: Creating a cluster admin
- OpenShift Dedicated: Managing OpenShift Dedicated administrators
- ROSA: Creating a cluster administrator user for quick cluster access
3.3. Installing the Red Hat OpenShift AI Operator
This section shows how to install the Red Hat OpenShift AI Operator on your OpenShift cluster using the command-line interface (CLI) and the OpenShift web console.
If you want to upgrade from a previous version of OpenShift AI rather than performing a new installation, see Upgrading OpenShift AI.
If your OpenShift cluster uses a proxy to access the Internet, you can configure the proxy settings for the Red Hat OpenShift AI Operator. See Overriding proxy settings of an Operator for more information.
3.3.1. Installing the Red Hat OpenShift AI Operator by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to install the Red Hat OpenShift AI Operator on your OpenShift cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.12 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in the OpenShift command-line interface (CLI).
$ oc login --token=<token> --server=<openshift_cluster_url>
Create a namespace for installation of the Operator by performing the following actions:
Create a namespace YAML file, for example,
rhods-operator-namespace.yaml
.apiVersion: v1 kind: Namespace metadata: name: redhat-ods-operator 1
- 1
- Defines the required
redhat-ods-operator
namespace for installation of the Operator.
Create the namespace in your OpenShift cluster.
$ oc create -f rhods-operator-namespace.yaml
You see output similar to the following:
namespace/redhat-ods-operator created
Create an operator group for installation of the Operator by performing the following actions:
Create an
OperatorGroup
object custom resource (CR) file, for example,rhods-operator-group.yaml
.apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: rhods-operator namespace: redhat-ods-operator 1
- 1
- Defines the required
redhat-ods-operator
namespace.
Create the
OperatorGroup
object in your OpenShift cluster.$ oc create -f rhods-operator-group.yaml
You see output similar to the following:
operatorgroup.operators.coreos.com/rhods-operator created
Create a subscription for installation of the Operator by performing the following actions:
Create a
Subscription
object CR file, for example,rhods-operator-subscription.yaml
.apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: rhods-operator namespace: redhat-ods-operator 1 spec: name: rhods-operator channel: <channel> 2 source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: rhods-operator.x.y.z 3
- 1
- Defines the required
redhat-ods-operator
namespace. - 2
- Sets the update channel. You must specify a value of
fast
,stable
,stable-x.y
eus-x.y
, oralpha
. For more information, see Understanding update channels. - 3
- Optional: Sets the operator version. If you do not specify a value, the subscription defaults to the latest operator version. For more information, see Red Hat OpenShift AI Self-Managed Life Cycle.
Create the
Subscription
object in your OpenShift cluster to install the Operator.$ oc create -f rhods-operator-subscription.yaml
You see output similar to the following:
subscription.operators.coreos.com/rhods-operator created
Verification
In the OpenShift web console, click Operators
Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses: -
Installing
- installation is in progress; wait for this to change toSucceeded
. This might take several minutes. -
Succeeded
- installation is successful.
-
In the web console, click Home
Projects and confirm that the following project namespaces are visible and listed as Active
:-
redhat-ods-applications
-
redhat-ods-monitoring
-
redhat-ods-operator
-
3.3.2. Installing the Red Hat OpenShift AI Operator by using the web console
The following procedure shows how to use the OpenShift web console to install the Red Hat OpenShift AI Operator on your cluster. You must install the Operator before you can install OpenShift AI components on the cluster.
Prerequisites
- You have a running OpenShift cluster, version 4.12 or greater, configured with a default storage class that can be dynamically provisioned.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
-
In the web console, click Operators
OperatorHub. - On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing Red Hat OpenShift AI into the Filter by keyword box.
- Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.
- Select a Channel. For information about subscription update channels, see Understanding update channels.
- Select a Version.
- Click Install. The Install Operator page opens.
- Review or change the selected channel and version as needed.
-
For Installation mode, note that the only available value is
All namespaces on the cluster (default)
. This installation mode makes the Operator available to all namespaces in the cluster. -
For Installed Namespace, select
Operator recommended Namespace: redhat-ods-operator
. For Update approval, select one of the following update strategies:
-
Automatic
: New updates in the update channel are installed as soon as they become available. Manual
: A cluster administrator must approve any new updates before installation begins.ImportantBy default, the Red Hat OpenShift AI Operator follows a sequential update process. This means that if there are several versions between the current version and the target version, Operator Lifecycle Manager (OLM) upgrades the Operator to each of the intermediate versions before it upgrades it to the final, target version.
If you configure automatic upgrades, OLM automatically upgrades the Operator to the latest available version. If you configure manual upgrades, a cluster administrator must manually approve each sequential update between the current version and the final, target version.
For information about supported versions, see Red Hat OpenShift AI Life Cycle.
-
Click Install.
The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.
Verification
In the OpenShift web console, click Operators
Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses: -
Installing
- installation is in progress; wait for this to change toSucceeded
. This might take several minutes. -
Succeeded
- installation is successful.
-
In the web console, click Home
Projects and confirm that the following project namespaces are visible and listed as Active
:-
redhat-ods-applications
-
redhat-ods-monitoring
-
redhat-ods-operator
-
3.4. Installing and managing Red Hat OpenShift AI components
The following procedures show how to use the command-line interface (CLI) and OpenShift web console to install and manage components of Red Hat OpenShift AI on your OpenShift cluster.
3.4.1. Installing Red Hat OpenShift AI components by using the CLI
The following procedure shows how to use the OpenShift command-line interface (CLI) to install specific components of Red Hat OpenShift AI on your OpenShift cluster.
The following procedure describes how to create and configure a DataScienceCluster
object to install Red Hat OpenShift AI components as part of a new installation. However, if you upgraded from version 1 of OpenShift AI (previously OpenShift Data Science), the upgrade process automatically created a default DataScienceCluster
object. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object. To inspect the default DataScienceCluster
object and change the installation status of Red Hat OpenShift AI components, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the upper-right corner of the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in the OpenShift command-line interface (CLI).
$ oc login --token=<token> --server=<openshift_cluster_url>
Create a
DataScienceCluster
object custom resource (CR) file, for example,rhods-operator-dsc.yaml
.apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Removed datasciencepipelines: managementState: Removed kserve: managementState: Removed 1 2 kueue: managementState: Removed modelmeshserving: managementState: Removed ray: managementState: Removed workbenches: managementState: Removed
- 1
- To fully install the KServe component, which is used by the single-model serving platform to serve large models, you must install Operators for Red Hat OpenShift Service Mesh and Red Hat OpenShift Serverless and perform additional configuration. See Serving large models.
- 2
- If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies.
In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to fully install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
- To learn how to configure the distributed workloads feature that uses the CodeFlare, KubeRay, and Kueue components, see Configuring distributed workloads.
Create the
DataScienceCluster
object in your OpenShift cluster to install the specified OpenShift AI components.$ oc create -f rhods-operator-dsc.yaml
You see output similar to the following:
datasciencecluster.datasciencecluster.opendatahub.io/default created
Verification
Confirm that there is a running pod for each component:
-
In the OpenShift web console, click Workloads
Pods. -
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
-
In the OpenShift web console, click Workloads
Confirm the status of all installed components:
-
In the OpenShift web console, click Operators
Installed Operators. - Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
-
In the OpenShift web console, click Operators
3.4.2. Installing Red Hat OpenShift AI components by using the web console
The following procedure shows how to use the OpenShift web console to install specific components of Red Hat OpenShift AI on your cluster.
The following procedure describes how to create and configure a DataScienceCluster
object to install Red Hat OpenShift AI components as part of a new installation. However, if you upgraded from version 1 of OpenShift AI (previously OpenShift Data Science), the upgrade process automatically created a default DataScienceCluster
object. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object. To inspect the DataScienceCluster
object and change the installation status of Red Hat OpenShift AI components, see Updating the installation status of Red Hat OpenShift AI components by using the web console.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster. See Installing the Red Hat OpenShift AI Operator.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
-
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. Create a
DataScienceCluster
object to install OpenShift AI components by performing the following actions:- Click the Data Science Cluster tab.
- Click Create DataScienceCluster.
For Configure via, select YAML view.
An embedded YAML editor opens showing a default custom resource (CR) for the
DataScienceCluster
object.In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
-
If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies. - To learn how to configure the distributed workloads feature that uses the CodeFlare and KubeRay components, see Configuring distributed workloads.
- Click Create.
Verification
Confirm that there is a running pod for each component:
-
In the OpenShift web console, click Workloads
Pods. -
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
-
In the OpenShift web console, click Workloads
Confirm the status of all installed components:
-
In the OpenShift web console, click Operators
Installed Operators. - Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
-
In the OpenShift web console, click Operators
3.4.3. Updating the installation status of Red Hat OpenShift AI components by using the web console
The following procedure shows how to use the OpenShift web console to update the installation status of components of Red Hat OpenShift AI on your OpenShift cluster.
If you upgraded from version 1 to version 2 of OpenShift AI, the upgrade process automatically created a default DataScienceCluster
object and enabled several components of OpenShift AI. If you upgraded from a previous minor version, the upgrade process used the settings from the previous version’s DataScienceCluster
object.
The following procedure describes how to edit the DataScienceCluster
object to do the following:
- Change the installation status of the existing Red Hat OpenShift AI components
-
Add additional components to the
DataScienceCluster
object that were not available in the previous version of OpenShift AI.
Prerequisites
- The Red Hat OpenShift AI Operator is installed on your OpenShift cluster.
- You have cluster administrator privileges for your OpenShift cluster.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
-
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. - Click the Data Science Cluster tab.
-
On the DataScienceClusters page, click the
default
object. Click the YAML tab.
An embedded YAML editor opens showing the custom resource (CR) file for the
DataScienceCluster
object.In the
spec.components
section of the CR, for each OpenShift AI component shown, set the value of themanagementState
field to eitherManaged
orRemoved
. These values are defined as follows:NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.- Managed
- The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
- Removed
- The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
Important- To learn how to install the KServe component, which is used by the single-model serving platform to serve large models, see Serving large models.
-
If you have not enabled the KServe component (that is, you set the value of the
managementState
field toRemoved
), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies.
-
If they are not already present in the CR file, you can install the CodeFlare, KubeRay, and Kueue components by adding the
codeflare
,ray
, andkueue
entries to thespec.components
section of the CR and setting themanagementState
field for the components toManaged
. - To learn how to configure the distributed workloads feature that uses the CodeFlare, KubeRay, and Kueue components, see Configuring distributed workloads.
Click Save.
For any components that you updated, OpenShift AI initiates a rollout that affects all pods to use the updated image.
Verification
Confirm that there is a running pod for each component:
-
In the OpenShift web console, click Workloads
Pods. -
In the Project list at the top of the page, select
redhat-ods-applications
. - In the applications namespace, confirm that there are running pods for each of the OpenShift AI components that you installed.
-
In the OpenShift web console, click Workloads
Confirm the status of all installed components:
-
In the OpenShift web console, click Operators
Installed Operators. - Click the Red Hat OpenShift AI Operator.
-
Click the Data Science Cluster tab and select the
DataScienceCluster
object calleddefault-dsc
. - Select the YAML tab.
In the
installedComponents
section, confirm that the components you installed have a status value oftrue
.NoteIf a component shows with the
component-name: {}
format in thespec.components
section of the CR, the component is not installed.
-
In the OpenShift web console, click Operators
3.4.4. Disabling KServe dependencies
If you have not enabled the KServe component (that is, you set the value of the managementState
field to Removed
), you must also disable the dependent Service Mesh component to avoid errors.
Prerequisites
- You have used the OpenShift command-line interface (CLI) or web console to disable the KServe component.
Procedure
- Log in to the OpenShift web console as a cluster administrator.
-
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. Disable the OpenShift Service Mesh component as follows:
- Click the DSC Initialization tab.
- Click the default-dsci object.
- Click the YAML tab.
In the
spec
section, add theserviceMesh
component (if it is not already present) and configure themanagementState
field as shown:spec: serviceMesh: managementState: Removed
- Click Save.
Verification
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. The Operator details page opens.
- In the Conditions section, confirm that there is no ReconcileComplete condition with a status value of Unknown.