Chapter 1. Managing AI pipelines
1.1. Configuring a pipeline server Copy linkLink copied to clipboard!
Before you can successfully create an AI pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored.
You are not required to specify any storage directories when configuring a connection for your pipeline server. When you import a pipeline, the /pipelines folder is created in the root folder of the bucket, containing a YAML file for the pipeline. If you upload a new version of the same pipeline, a new YAML file with a different ID is added to the /pipelines folder.
When you run a pipeline, the artifacts are stored in the /pipeline-name folder in the root folder of the bucket.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have created a project that you can add a pipeline server to.
- You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account.
- If you are configuring a pipeline server for production pipeline workloads, you have an existing external MySQL or MariaDB database.
If you are configuring a pipeline server with an external MySQL database, your database must use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
NoteThe
mysql_native_passwordauthentication plugin is required for the ML Metadata component to successfully connect to your database.mysql_native_passwordis disabled by default in MySQL 8.4 and later. If your database uses MySQL 8.4 or later, you must update your MySQL deployment to enable themysql_native_passwordplugin.For more information about enabling the
mysql_native_passwordplugin, see Native Pluggable Authentication in the MySQL documentation.- If you are configuring a pipeline server with a MariaDB database, your database must use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.
Procedure
- From the OpenShift AI dashboard, click Projects.
On the Projects page, click the name of the project that you want to configure a pipeline server for.
The project details page opens.
- Click the Pipelines tab.
Click Configure pipeline server.
The Configure pipeline server dialog opens.
In the Object storage connection section, provide values for the mandatory fields:
- In the Access key field, enter the access key ID for the S3-compatible object storage provider.
- In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
- In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
- In the Region field, enter the default region of your S3-compatible object storage account.
In the Bucket field, enter the name of your S3-compatible object storage bucket.
ImportantIf you specify incorrect connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.
If you want to use an existing artifact that was not generated by a task in a pipeline, you can use the kfp.dsl.importer component to import the artifact from its URI. You can only import these artifacts to the S3-compatible object storage bucket that you define in the Bucket field in your pipeline server configuration. For more information about the
kfp.dsl.importercomponent, see Special Case: Importer Components.
- Click Advanced settings to display the Database, Pipeline definition storage, and Pipeline caching sections.
In the Database section, choose one of the following options to specify where to store your pipeline metadata and run information:
Select Default database on the cluster to deploy a MariaDB database in your project.
ImportantThe Default database on the cluster option is intended for development and testing purposes only. For production pipeline workloads, select the External MySQL database option to use an external MySQL or MariaDB database.
Select External MySQL database to add a new connection to an external MySQL or MariaDB database that your pipeline server can access.
- In the Host field, enter the database hostname.
- In the Port field, enter the database port.
- In the Username field, enter the default user name that is connected to the database.
- In the Password field, enter the password for the default user account.
- In the Database field, enter the database name.
- Optional: By default, pipeline definitions are stored as Kubernetes resources, enabling version control, GitOps workflows, and integration with OpenShift GitOps or similar tools. To store pipeline definitions in the internal database instead, clear the Store pipeline definitions in Kubernetes checkbox in the Pipeline definition storage section.
- Optional: By default, caching is configurable at both the pipeline and task levels. To disable caching for all pipelines and tasks in the pipeline server and override any pipeline-level and task-level caching settings, clear the Allow caching to be configured per pipeline and task checkbox in the Pipeline caching section.
- Click Configure pipeline server.
Verification
On the Pipelines tab for the project:
- The Import pipeline button is available.
- When you click the action menu (⋮) and then click Manage pipeline server configuration, the pipeline server details are displayed.
1.1.1. Configuring a pipeline server with an external Amazon RDS database Copy linkLink copied to clipboard!
To configure a pipeline server with an external Amazon Relational Database Service (RDS) database, you must configure OpenShift AI to trust the certificates issued by its certificate authorities (CA).
If you are configuring a pipeline server for production pipeline workloads, Red Hat recommends that you use an external MySQL or MariaDB database.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have logged in to Red Hat OpenShift AI.
- You have created a project that you can add a pipeline server to.
- You have an existing S3-compatible object storage bucket, and you have configured your storage account with write access to your S3 bucket.
Procedure
Before configuring your pipeline server, from Amazon RDS: Certificate bundles by AWS Region, download the PEM certificate bundle for the region that the database was created in.
For example, if the database was created in the
us-east-1region, downloadus-east-1-bundle.pem.In a terminal window, log in to the OpenShift cluster where OpenShift AI is deployed.
oc login api.<cluster_name>.<cluster_domain>:6443 --web
oc login api.<cluster_name>.<cluster_domain>:6443 --webCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command to fetch the current OpenShift AI trusted CA configuration and store it in a new file:
oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crt
oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command to append the PEM certificate bundle that you downloaded to the new custom CA configuration file:
cat us-east-1-bundle.pem >> /tmp/my-custom-ca-bundles.crt
cat us-east-1-bundle.pem >> /tmp/my-custom-ca-bundles.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command to update the OpenShift AI trusted CA configuration to trust certificates issued by the CAs included in the new custom CA configuration file:
oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Configure a pipeline server, as described in Configuring a pipeline server.
Verification
- The pipeline server starts successfully.
- You can import and run AI pipelines.
1.2. Defining a pipeline Copy linkLink copied to clipboard!
The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and AI pipelines. Use the latest Kubeflow Pipelines 2.0 SDK to build your AI pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. For more information about compiling pipelines, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK and Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK. Compiling to Kubernetes-native manifests is optional and applies only when your pipeline server is configured to use Kubernetes API storage. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings.
If you are using OpenShift AI on a cluster running in FIPS mode, any custom container images for AI pipelines must be based on UBI 9 or RHEL 9. This ensures compatibility with FIPS-approved pipeline components and prevents errors related to mismatched OpenSSL or GNU C Library (glibc) versions.
You can also use the Elyra JupyterLab extension to create and run AI pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information about the Elyra JupyterLab extension, see Elyra Documentation.
1.2.1. Compiling the pipeline YAML with the Kubeflow Pipelines SDK Copy linkLink copied to clipboard!
Before you can define your pipeline in the cluster, you must convert your Python-defined pipeline into YAML format. You can use the Kubeflow Pipelines (KFP) Software Development Kit (SDK) to compile your pipeline code into a deployable YAML file for declarative GitOps deployment.
Prerequisites
- You have installed Python 3.11 or later in your local environment.
-
You have installed the Kubeflow Pipelines SDK package (
kfp) version 2.14.3 or later. - You have a valid Python pipeline definition file.
Procedure
Compile your pipeline by using the KFP SDK to generate the pipeline YAML file.
In the following example, replace <pipeline_file>.py with the name of your Python pipeline file and specify an output file for the compiled YAML:
kfp dsl compile \
--py <pipeline_file>.py \
--output <compiled_pipeline_file>.yaml
$ kfp dsl compile \
--py <pipeline_file>.py \
--output <compiled_pipeline_file>.yaml
The generated <compiled_pipeline_file>.yaml file contains the compiled pipeline specification in YAML format. You can use this content as the value of the pipelineSpec field when you create a PipelineVersion custom resource (CR). You can also store the file in Git for declarative or GitOps-based deployment.
Verification
Verify that the generated file includes a pipelineSpec key followed by the compiled pipeline definition:
head -n 10 <compiled_pipeline_file>.yaml
$ head -n 10 <compiled_pipeline_file>.yaml
Additional resources
1.2.2. Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK Copy linkLink copied to clipboard!
If your pipeline server uses the Kubernetes native API mode, you can compile your pipeline directly to Kubernetes manifests. The output includes Pipeline and PipelineVersion custom resources with spec.pipelineSpec and, when you use Kubernetes resource configuration, an optional spec.platformSpec.
Prerequisites
- You have installed Python 3.11 or later in your local environment.
-
You have installed the Kubeflow Pipelines SDK package (
kfp) version 2.14.3 or later. - You have a valid Python pipeline definition file.
Procedure
Save the following code as a new file named
compile.pyin your working directory.The example uses the
KubernetesManifestOptionsclass from thekfp.compiler.compiler_utilsmodule to define pipeline metadata such as the name, version, and namespace.Example compile script
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the script to compile your pipeline and generate the Kubernetes manifests:
python compile.py
$ python compile.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the compiled output includes the expected resources:
Additional resources
1.2.3. Authenticating the Kubeflow Pipelines SDK with a pipeline server Copy linkLink copied to clipboard!
You can connect the Kubeflow Pipelines (KFP) SDK to a pipeline server that is exposed by OpenShift AI. The pipeline server route is protected by OpenShift OAuth, so you must provide a valid access token when you create the KFP client.
Prerequisites
-
You have logged in to the OpenShift CLI (
oc) as a user who can access the project. - You have created a project and configured a pipeline server for that project.
- You have installed Python and the required packages in your environment.
- Optional: If your cluster uses a custom or self-signed certificate, you know the path to the trusted certificate bundle that your environment uses.
Procedure
Set environment variables for your project and pipeline server route:
export NAMESPACE=<project_namespace> export DSPA_NAME=$(oc -n "$NAMESPACE" get dspa -o jsonpath='{.items[0].metadata.name}') export API_URL="https://$(oc -n "$NAMESPACE" get route "ds-pipeline-${DSPA_NAME}" -o jsonpath='{.spec.host}')"export NAMESPACE=<project_namespace> export DSPA_NAME=$(oc -n "$NAMESPACE" get dspa -o jsonpath='{.items[0].metadata.name}') export API_URL="https://$(oc -n "$NAMESPACE" get route "ds-pipeline-${DSPA_NAME}" -o jsonpath='{.spec.host}')"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<project_namespace>with the name of your project.Obtain an OpenShift access token for the current user:
export OCP_TOKEN=$(oc whoami --show-token)
export OCP_TOKEN=$(oc whoami --show-token)Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAvoid pasting the access token directly into commands or scripts. The token can appear in your shell history or in process listings if you pass it as a literal argument.
To reduce this risk, store the token in an environment variable and reference it from your code or commands. For example:
./.venv/bin/python my_script.py --kfp-server-host "$API_URL" --namespace "$NAMESPACE" --token "$OCP_TOKEN"
./.venv/bin/python my_script.py --kfp-server-host "$API_URL" --namespace "$NAMESPACE" --token "$OCP_TOKEN"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, use a prompt with
read -sto input the token securely at runtime.Optional: If you are running outside the cluster or you use a custom or self-signed certificate, set an environment variable for your trusted certificate bundle:
export SSL_CA_CERT=/etc/pki/tls/custom-certs/ca-bundle.crt
export SSL_CA_CERT=/etc/pki/tls/custom-certs/ca-bundle.crtCopy to Clipboard Copied! Toggle word wrap Toggle overflow Adjust the path if your environment uses a different certificate location.
In your Python environment, create a KFP client that uses the pipeline server route and OpenShift access token:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the connection by calling the API. For example, list experiments or pipelines:
print(client.list_experiments()) # or print(client.list_pipelines())
print(client.list_experiments()) # or print(client.list_pipelines())Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- The Python code runs without authentication errors.
- The command output lists experiments or pipelines that are defined on the pipeline server for the specified project.
Next steps
- Use the KFP SDK to compile and upload pipelines, create pipeline runs, or manage pipeline versions against the authenticated pipeline server.
- If required, integrate this client configuration into your own automation scripts or external applications that orchestrate pipelines on OpenShift AI.
1.2.4. Defining a pipeline by using the Kubernetes API Copy linkLink copied to clipboard!
You can define AI pipelines and pipeline versions by using the Kubernetes API, which stores them as custom resources in the cluster instead of the internal database. This approach makes it easier to use OpenShift GitOps (Argo CD) or similar tools to manage pipelines and pipeline versions, while still allowing you to manage them through the OpenShift AI dashboard, API, and the Kubeflow Pipelines (KFP) Software Development Kit (SDK). You can generate the required manifests by using the Kubeflow Pipelines SDK; see Compiling the pipeline YAML with the Kubeflow Pipelines SDK or Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
If your pipeline server is already configured to use Kubernetes API storage, you can still use the OpenShift AI dashboard and REST API to view pipeline details, run pipelines, and create schedules. In this mode, the Kubernetes API acts as the storage backend, so your existing tools continue to work as expected.
Prerequisites
- You have OpenShift AI administrator privileges or you are the project owner.
- You have a project with a running pipeline server.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
If you plan to create a
PipelineVersioncustom resource, you have either:- Compiled your Python pipeline to IR YAML by using the KFP SDK. See Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
- Compiled Kubernetes-native manifests by using the KFP SDK. See Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
Procedure
In a terminal window, log in to your OpenShift cluster by using the OpenShift CLI (
oc):oc login -u <user_name>
$ oc login -u <user_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow When prompted, enter the OpenShift server URL, connection type, and your password.
To configure the pipeline server to use Kubernetes API storage instead of the default
databaseoption, set thespec.apiServer.pipelineStorefield tokubernetesin your project’sDataSciencePipelinesApplication(DSPA) custom resource.In the following command, replace <dspa_name> with the name of your DSPA custom resource, and replace <namespace> with the name of your project:
oc patch dspa <dspa_name> -n <namespace> \ --type=merge \ -p {"spec": {"apiServer": {"pipelineStore": "kubernetes"}}}$ oc patch dspa <dspa_name> -n <namespace> \ --type=merge \ -p {"spec": {"apiServer": {"pipelineStore": "kubernetes"}}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow WarningWhen you switch the pipeline server from database storage to Kubernetes API storage, existing pipelines that were stored in the internal database are no longer visible in the OpenShift AI dashboard or REST API. To view or manage those pipelines again, change the
spec.apiServer.pipelineStorefield back todatabase.Define a
Pipelinecustom resource in a YAML file with the following contents:Example pipeline definition
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
name: The immutable Kubernetes resource name of your pipeline. -
namespace: The name of your project. -
displayName: The user-friendly display name of your pipeline, which is shown in the dashboard and REST API.
-
Apply the pipeline definition to create the
Pipelinecustom resource in your cluster.In the following command, replace <pipeline_yaml_file> with the name of your YAML file:
Example command
oc apply -f <pipeline_yaml_file>.yaml
$ oc apply -f <pipeline_yaml_file>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, if you compiled Kubernetes-native manifests with the KFP SDK, you can apply the generated file directly without manually creating separate YAML files:
oc apply -f <output_file>.yaml
$ oc apply -f <output_file>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow The generated file includes both
PipelineandPipelineVersionresources. You can skip the following manual definition steps and proceed to the verification step.Define a
PipelineVersioncustom resource in a YAML file with the following contents:Example pipeline version definition
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
name: The name of your pipeline version. -
namespace: The name of your project. -
pipelineName: The immutable Kubernetes resource name of your pipeline. This value must match themetadata.namevalue in thePipelinecustom resource. -
displayName: The user-friendly display name of your pipeline version, which is shown in the dashboard and REST API. -
pipelineSpec: The YAML content that you generated by using the Kubeflow Pipelines (KFP) SDK.
-
Apply the pipeline version definition to create the
PipelineVersioncustom resource in your cluster.In the following command, replace <pipeline_version_yaml_file> with the name of your YAML file:
Example command
oc apply -f <pipeline_version_yaml_file>.yaml
$ oc apply -f <pipeline_version_yaml_file>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow After creating the pipeline version, the system automatically applies the following labels to the pipeline version for easier filtering:
Example automatic labels
pipelines.kubeflow.org/pipeline-id: <metadata.uid of the pipeline> pipelines.kubeflow.org/pipeline: <pipeline name>
pipelines.kubeflow.org/pipeline-id: <metadata.uid of the pipeline> pipelines.kubeflow.org/pipeline: <pipeline name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the
Pipelinecustom resource was successfully created:oc get pipeline <pipeline_name> -n <namespace>
$ oc get pipeline <pipeline_name> -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
PipelineVersioncustom resource was successfully created:oc get pipelineversion <pipeline_version_name> -n <namespace>
$ oc get pipelineversion <pipeline_version_name> -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2.5. Migrating pipelines from database to Kubernetes API storage Copy linkLink copied to clipboard!
You can migrate existing pipelines and pipeline versions from the internal database to Kubernetes custom resources. This makes it easier to use OpenShift GitOps (Argo CD) or similar tools to manage pipelines and pipeline versions, while still allowing you to manage them through the OpenShift AI dashboard, API, and the Kubeflow Pipelines (KFP) Software Development Kit (SDK).
This procedure uses a community-supported Kubeflow Pipelines migration script to export pipelines from the AI Pipelines API and generate corresponding Pipeline and PipelineVersion custom resources for import into your cluster.
The migration script in this procedure is maintained by the Kubeflow Pipelines community and is not supported by Red Hat. Before you use the script, review the repository and validate it in a non-production environment.
The pipeline and pipeline version IDs change during migration, so existing pipeline runs do not map to the migrated pipeline version. The original ID is stored in the pipelines.kubeflow.org/original-id label.
Prerequisites
- You have OpenShift AI administrator privileges or you are the project owner.
- You have a project with a running pipeline server.
-
The pipeline server is configured with
spec.apiServer.pipelineStore: database. - You have Python 3.11 installed in your local environment.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
In a terminal window, log in to your OpenShift cluster by using the OpenShift CLI (
oc):oc login -u <user_name>
$ oc login -u <user_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow When prompted, enter the OpenShift server URL, connection type, and your password.
Set environment variables for your project and get the pipeline API route.
In the
exportcommand, replace <namespace> with the name of your project:echo "Setting the prerequisite variables" export NAMESPACE=<namespace> export DSPA_NAME=$(oc -n $NAMESPACE get dspa -o jsonpath={.items[0].metadata.name}) export API_URL="https://$(oc -n $NAMESPACE get route "ds-pipeline-$DSPA_NAME" -o jsonpath={.spec.host})"echo "Setting the prerequisite variables" export NAMESPACE=<namespace> export DSPA_NAME=$(oc -n $NAMESPACE get dspa -o jsonpath={.items[0].metadata.name}) export API_URL="https://$(oc -n $NAMESPACE get route "ds-pipeline-$DSPA_NAME" -o jsonpath={.spec.host})"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Python virtual environment and install the required dependencies.
echo "Set up the Python prerequisites" python3.11 -m venv .venv ./.venv/bin/pip install kfp requests PyYAML
echo "Set up the Python prerequisites" python3.11 -m venv .venv ./.venv/bin/pip install kfp requests PyYAMLCopy to Clipboard Copied! Toggle word wrap Toggle overflow Download and run the Kubeflow Pipelines community migration script.
The script connects to the AI Pipelines API, exports all pipelines and versions from the specified project, and generates one YAML file per pipeline in a local
kfp-exported-pipelines/directory. Each file includes aPipelineresource followed by all associatedPipelineVersionresources.Run the following command:
curl -L https://raw.githubusercontent.com/kubeflow/pipelines/refs/heads/master/tools/k8s-native/migration.py -o migration.py ./.venv/bin/python migration.py --skip-tls-verify --kfp-server-host $API_URL --namespace $NAMESPACE --token "$(oc whoami --show-token)"
curl -L https://raw.githubusercontent.com/kubeflow/pipelines/refs/heads/master/tools/k8s-native/migration.py -o migration.py ./.venv/bin/python migration.py --skip-tls-verify --kfp-server-host $API_URL --namespace $NAMESPACE --token "$(oc whoami --show-token)"Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe
--skip-tls-verifyoption disables certificate validation and should be used only in development environments or when connecting to a server with a self-signed certificate. In production environments, provide a valid certificate bundle instead.Additionally, passing the access token directly on the command line might expose it in shell history or process lists. To reduce this risk, store the token in an environment variable and reference it in your command:
export KFP_TOKEN=$(oc whoami --show-token) ./.venv/bin/python migration.py --kfp-server-host $API_URL --namespace $NAMESPACE --token "$KFP_TOKEN"
export KFP_TOKEN=$(oc whoami --show-token) ./.venv/bin/python migration.py --kfp-server-host $API_URL --namespace $NAMESPACE --token "$KFP_TOKEN"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, use a prompt with
read -sto input the token securely at runtime.Optional: For more information about the script, run the following command:
./.venv/bin/python migration.py --help
./.venv/bin/python migration.py --helpCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
If you plan to create new or updated
PipelineVersioncustom resources after migration, you can compile your pipeline code by using the Kubeflow Pipelines SDK. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK and Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
Apply the exported Kubernetes custom resources to your cluster.
oc apply -f ./kfp-exported-pipelines
oc apply -f ./kfp-exported-pipelinesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change the pipeline server to use Kubernetes API storage.
oc -n "$NAMESPACE" patch dspa "$DSPA_NAME" --type=merge -p {"spec":{"apiServer":{"pipelineStore":"kubernetes"}}}oc -n "$NAMESPACE" patch dspa "$DSPA_NAME" --type=merge -p {"spec":{"apiServer":{"pipelineStore":"kubernetes"}}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo view pipelines that were stored in the internal database and not migrated, you can temporarily change the pipeline server back to
databasestorage.oc -n $NAMESPACE patch dspa $DSPA_NAME --type=merge -p {"spec":{"apiServer":{"pipelineStore":"database"}}}oc -n $NAMESPACE patch dspa $DSPA_NAME --type=merge -p {"spec":{"apiServer":{"pipelineStore":"database"}}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Repeat this procedure for each additional project that you want to migrate, changing
NAMESPACEto the appropriate project name. Optional: Clean up the local environment.
rm -rf .venv migration.py
rm -rf .venv migration.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the
PipelineandPipelineVersioncustom resources were created in your project:oc -n <namespace> get pipelines.pipelines.kubeflow.org oc -n <namespace> get pipelineversions.pipelines.kubeflow.org
$ oc -n <namespace> get pipelines.pipelines.kubeflow.org $ oc -n <namespace> get pipelineversions.pipelines.kubeflow.orgCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the pipeline server is using Kubernetes API storage:
oc -n <namespace> get dspa <dspa_name> -o jsonpath={.spec.apiServer.pipelineStore}{"\n"}$ oc -n <namespace> get dspa <dspa_name> -o jsonpath={.spec.apiServer.pipelineStore}{"\n"}Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command should return
kubernetes.
Additional resources
1.3. Importing a pipeline Copy linkLink copied to clipboard!
To help you begin working with AI pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a configured pipeline server.
- You have compiled your pipeline with the Kubeflow compiler and you have access to the resulting YAML file.
- If you are uploading your pipeline from a URL, the URL is publicly accessible.
If your pipeline is defined in Python code instead of a YAML file, compile it first by using the KFP SDK. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definition page, from the Project drop-down list, select the project that you want to import a pipeline to.
- Click Import pipeline.
In the Import pipeline dialog, enter the details for the pipeline that you want to import.
- In the Pipeline name field, enter a name for the pipeline that you want to import.
- In the Pipeline description field, enter a description for the pipeline that want to import.
Select where you want to import your pipeline from by performing one of the following actions:
- Select Upload a file to upload your pipeline from your local machine’s file system. Import your pipeline by clicking Upload, or by dragging and dropping a file.
- Select Import by url to upload your pipeline from a URL, and then enter the URL into the text box.
- Click Import pipeline.
Verification
- The pipeline that you imported is displayed on the Pipeline definitions page and on the Pipelines tab on the project details page.
1.4. Deleting a pipeline Copy linkLink copied to clipboard!
If you no longer require access to your AI pipeline on the dashboard, you can delete it so that it does not appear on the Pipeline definitions page. .Prerequisites * You have logged in to Red Hat OpenShift AI. * There are active pipelines available on the Pipeline definitions page. * The pipeline that you want to delete does not contain any pipeline versions. * The pipeline that you want to delete does not contain any pipeline versions. For more information, see Deleting a pipeline version.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipeline that you want to delete.
- Click the action menu (⋮) beside the pipeline that you want to delete, and then click Delete pipeline.
- In the Delete pipeline dialog, enter the pipeline name in the text field to confirm that you intend to delete it.
- Click Delete pipeline.
Verification
- The AI pipeline that you deleted is no longer displayed on the Pipeline definitions page.
1.5. Deleting a pipeline server Copy linkLink copied to clipboard!
After you have finished running your AI pipelines, you can delete the pipeline server. Deleting a pipeline server automatically deletes all of its associated pipelines, pipeline versions, and runs. If your pipeline data is stored in a database, the database is also deleted along with its meta-data. In addition, after deleting a pipeline server, you cannot create new pipelines or pipeline runs until you create another pipeline server.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipeline server that you want to delete.
- From the Pipeline server actions list, select Delete pipeline server.
- In the Delete pipeline server dialog, enter the name of the pipeline server in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- Pipelines previously assigned to the deleted pipeline server no longer appear on the Pipeline definitions page for the relevant project.
- Pipeline runs previously assigned to the deleted pipeline server no longer appear on the Runs page for the relevant project.
1.6. Viewing the details of a pipeline server Copy linkLink copied to clipboard!
You can view the details of pipeline servers configured in OpenShift AI, such as the pipeline’s connection details and where its data is stored.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that contains an active and available pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipeline server that you want to view.
- From the Pipeline server actions list, select Manage pipeline server configuration.
Verification
- You can view the pipeline server details in the Manage pipeline server dialog.
1.7. Viewing existing pipelines Copy linkLink copied to clipboard!
You can view the details of pipelines that you have imported to Red Hat OpenShift AI, such as the pipeline’s last run, when it was created, the pipeline’s executed runs, and details of any associated pipeline versions.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- Existing pipelines are available.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipelines that you want to view.
-
Optional: Click Expand (
) on the row of a pipeline to view its pipeline versions.
Verification
- A list of pipelines is displayed on the Pipeline definitions page.
1.8. Overview of pipeline versions Copy linkLink copied to clipboard!
You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes. You can track and manage your changes on the OpenShift AI dashboard, allowing you to schedule and execute runs against all available versions of your pipeline.
1.9. Uploading a pipeline version Copy linkLink copied to clipboard!
You can upload a YAML file to an active pipeline server that contains the latest version of your pipeline, or you can upload the YAML file from a URL. The YAML file must consist of a Kubeflow pipeline compiled by using the Kubeflow compiler. After you upload a pipeline version to a pipeline server, you can execute it by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a configured pipeline server.
- You have a pipeline version available and ready to upload.
- If you are uploading your pipeline version from a URL, the URL is publicly accessible.
- If your pipeline version is based on Python code, compile it to YAML before uploading. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that you want to upload a pipeline version to.
- Click the Import pipeline drop-down list, and then select Upload new version.
In the Upload new version dialog, enter the details for the pipeline version that you are uploading.
- From the Pipeline list, select the pipeline that you want to upload your pipeline version to.
- In the Pipeline version name field, confirm the name for the pipeline version, and change it if necessary.
- In the Pipeline version description field, enter a description for the pipeline version.
Select where you want to upload your pipeline version from by performing one of the following actions:
- Select Upload a file to upload your pipeline version from your local machine’s file system. Import your pipeline version by clicking Upload, or by dragging and dropping a file.
- Select Import by url to upload your pipeline version from a URL, and then enter the URL into the text box.
- Click Upload.
Verification
-
The pipeline version that you uploaded is displayed on the Pipeline definitions page. Click Expand (
) on the row containing the pipeline to view its versions.
- The Version column on the row containing the pipeline version that you uploaded on the Pipeline definitions page increments by one.
1.10. Deleting a pipeline version Copy linkLink copied to clipboard!
You can delete specific versions of a pipeline when you no longer require them. Deleting a default pipeline version automatically changes the default pipeline version to the next most recent version. If no pipeline versions exist, the pipeline persists without a default version.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. The Pipeline definitions page opens.
Delete the pipeline versions that you no longer require:
To delete a single pipeline version:
- From the Project list, select the project that contains a version of a pipeline that you want to delete.
-
On the row containing the pipeline, click Expand (
).
Click the action menu (⋮) beside the version that you want to delete, and then click Delete pipeline version.
The Delete pipeline version dialog opens.
- Enter the name of the pipeline version in the text field to confirm that you intend to delete it.
- Click Delete.
To delete multiple pipeline versions:
- On the row containing each pipeline version that you want to delete, select the checkbox.
- Click the action menu (⋮) next to the Import pipeline drop-down list, and then select Delete from the list.
Verification
- The pipeline version that you deleted is no longer displayed on the Pipeline definitions page or on the Pipelines tab for the project.
1.11. Viewing the details of a pipeline version Copy linkLink copied to clipboard!
You can view the details of a pipeline version that you have uploaded to Red Hat OpenShift AI, such as its graph and YAML code.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a pipeline server.
- You have a pipeline available on an active and available pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipeline versions that you want to view details for.
Click the pipeline name to view further details of its most recent version. The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.
Alternatively, click Expand (
) on the row containing the pipeline that you want to view versions for, and then click the pipeline version that you want to view the details of. The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.
Verification
- On the pipeline version details page, you can view the pipeline graph, summary details, and YAML code.
1.12. Downloading a pipeline version Copy linkLink copied to clipboard!
To make further changes to an AI pipeline version that you previously uploaded to OpenShift AI, you can download pipeline version code from the user interface.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a project that is available and contains a configured pipeline server.
- You have created and imported a pipeline to an active pipeline server that is available to download.
Procedure
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the version that you want to download.
-
Click Expand (
) beside the pipeline that contains the version that you want to download.
Click the pipeline version that you want to download.
The pipeline version details page opens.
-
Click the Pipeline spec tab, and then click the Download button (
) to download the YAML file that contains the pipeline version code to your local machine.
Verification
- The pipeline version code downloads to your browser’s default directory for downloaded files.
1.13. Overview of pipelines caching Copy linkLink copied to clipboard!
You can use caching within AI pipelines to optimize execution times and improve resource efficiency. Caching reduces redundant task execution by reusing results from previous runs with identical inputs.
Caching is particularly beneficial for iterative tasks, where intermediate steps might not need to be repeated. Understanding caching can help you design more efficient pipelines and save time in model development.
Caching operates by storing the outputs of successfully completed tasks and comparing the inputs of new tasks against previously cached ones. If a match is found, OpenShift AI reuses the cached results instead of re-executing the task, reducing computation time and resource usage.
1.13.1. Caching criteria Copy linkLink copied to clipboard!
For caching to be effective, the following criteria determine if a task can use previously cached results:
- Input data and parameters: If the input data and parameters for a task are unchanged from a previous run, cached results are eligible for reuse.
- Task code and configuration: Changes to the task code or configurations invalidate the cache to ensure that modifications are always reflected.
- Pipeline environment: Changes to the pipeline environment, such as dependency versions, also affect caching eligibility to maintain consistency.
1.13.2. Viewing cached steps in the OpenShift AI user interface Copy linkLink copied to clipboard!
Cached steps in pipelines are visually indicated in the user interface (UI):
-
Tasks that use cached results display a green icon, helping you quickly identify which steps were cached. The Status field in the side panel displays
Cachedfor cached tasks. - The UI also includes information about when the task was previously executed, allowing for easy verification of cache usage.
To check the caching status of specific tasks, navigate to the pipeline details view in the UI. Cached and non-cached tasks are clearly indicated. Cached tasks do not display execution logs because they reuse previously generated outputs and are not re-executed.
1.13.3. Controlling caching in pipelines Copy linkLink copied to clipboard!
Caching is enabled by default in OpenShift AI to improve performance. However, there are instances when disabling caching might be necessary for specific tasks, an entire pipeline, or all pipelines. For example, caching might not be beneficial for tasks that rely on frequently updated data or unique computational needs. In other cases, such as debugging, development, or when deterministic re-execution is required, you might want to disable caching for all pipelines.
Disabling caching at the pipeline or pipeline server level causes all tasks to re-run, potentially increasing compute time and resource usage.
You can control caching for AI pipelines in the following ways:
- Individual task: Data scientists can disable caching for specific steps in a pipeline.
- Pipeline (submit time): Data scientists can disable caching when submitting a pipeline run.
- Pipeline (compile time): Data scientists can disable caching when compiling a pipeline.
- All pipelines (pipeline server): You can disable caching for all pipelines in the pipeline server, which overrides all pipeline and task-level caching settings.
1.13.3.1. Disabling caching for individual tasks Copy linkLink copied to clipboard!
To disable caching for a particular task, apply the set_caching_options method directly to the task in your pipeline code:
task_name.set_caching_options(False)
task_name.set_caching_options(False)
After applying this setting, OpenShift AI runs the task in future pipeline runs, ignoring any cached results.
You can re-enable caching for individual tasks by setting the set_caching_options parameter to True or by omitting set_caching_options.
This setting is ignored if caching is disabled in the pipeline server.
1.13.3.2. Disabling caching for a pipeline at submit time Copy linkLink copied to clipboard!
To disable caching for the entire pipeline during pipeline submission, set the enable_caching parameter to False in your pipeline code. This setting ensures that no steps are cached during pipeline execution. The enable_caching parameter is available only when using the kfp.client to submit pipelines or start pipeline runs, such as the run_pipeline method.
Example:
This setting is ignored if caching is disabled during pipeline compilation or in the pipeline server.
1.13.3.3. Disabling caching for a pipeline at compile time Copy linkLink copied to clipboard!
To disable caching for the entire pipeline during compilation, set one of the following options in your local environment or workbench:
Environment variable:
export KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT=true
export KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT=trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow CLI flag (when using
kfp dsl compile):kfp dsl compile --disable-execution-caching-by-default
kfp dsl compile --disable-execution-caching-by-defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow
These settings are ignored if caching is disabled in the pipeline server.
1.13.3.4. Disabling caching for all pipelines (pipeline server) Copy linkLink copied to clipboard!
To disable caching for all pipelines in the pipeline server and override all pipeline and task-level caching settings, use either of the following methods:
- Pipeline server configuration
-
From the OpenShift AI dashboard, click Develop & train
Pipelines Pipeline definitions. - On the Pipeline definitions page, from the Project drop-down list, select the project that contains the pipeline server that you want to configure.
- From the Pipeline server actions list, select Manage pipeline server configuration.
- In the Pipeline caching section, clear the Allow caching to be configured per pipeline and task checkbox.
- Click Save.
-
From the OpenShift AI dashboard, click Develop & train
- DataSciencePipelinesApplication (cluster administrator)
In the OpenShift console or CLI, set the
cacheEnabledfield tofalsein theDataSciencePipelinesApplication(DSPA) custom resource for the project.Example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To allow caching to be configured at the pipeline and task level, set the
cacheEnabledfield totruein the DSPA custom resource.
After applying this setting, all pipeline and task-level caching settings are ignored.
Changing this setting updates the CACHEENABLED environment variable in the pipeline server deployment.
Verification
After configuring caching settings, you can verify its behavior by using one of the following methods:
- Check the UI: Locate the green icons in the task list to identify cached steps.
- Test task re-runs: Disable caching on specific tasks or the pipeline to confirm that steps re-execute as expected.
- Validate inputs: Ensure the task inputs, parameters, and runtime settings are unchanged when caching is applied.
You can also disable caching for a single node or for your entire pipeline in JupyterLab using Elyra. For more information, see Disabling node caching in Elyra.