Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 1. Managing data science pipelines
1.1. Configuring a pipeline server Link kopierenLink in die Zwischenablage kopiert!
Before you can successfully create a pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored.
You are not required to specify any storage directories when configuring a connection for your pipeline server. When you import a pipeline, the /pipelines folder is created in the root folder of the bucket, containing a YAML file for the pipeline. If you upload a new version of the same pipeline, a new YAML file with a different ID is added to the /pipelines folder.
When you run a pipeline, the artifacts are stored in the /pipeline-name folder in the root folder of the bucket.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have created a data science project that you can add a pipeline server to.
- You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account.
- If you are configuring a pipeline server for production pipeline workloads, you have an existing external MySQL or MariaDB database.
If you are configuring a pipeline server with an external MySQL database, your database must use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
NoteThe
mysql_native_passwordauthentication plugin is required for the ML Metadata component to successfully connect to your database.mysql_native_passwordis disabled by default in MySQL 8.4 and later. If your database uses MySQL 8.4 or later, you must update your MySQL deployment to enable themysql_native_passwordplugin.For more information about enabling the
mysql_native_passwordplugin, see Native Pluggable Authentication in the MySQL documentation.- If you are configuring a pipeline server with a MariaDB database, your database must use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.
Procedure
From the OpenShift AI dashboard, click Data science projects.
The Data science projects page opens.
Click the name of the project that you want to configure a pipeline server for.
A project details page opens.
- Click the Pipelines tab.
Click Configure pipeline server.
The Configure pipeline server dialog opens.
In the Object storage connection section, provide values for the mandatory fields:
- In the Access key field, enter the access key ID for the S3-compatible object storage provider.
- In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
- In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
- In the Region field, enter the default region of your S3-compatible object storage account.
In the Bucket field, enter the name of your S3-compatible object storage bucket.
ImportantIf you specify incorrect connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.
If you want to use an existing artifact that was not generated by a task in a pipeline, you can use the kfp.dsl.importer component to import the artifact from its URI. You can only import these artifacts to the S3-compatible object storage bucket that you define in the Bucket field in your pipeline server configuration. For more information about the
kfp.dsl.importercomponent, see Special Case: Importer Components.
- Click Advanced settings to display the Database, Pipeline definition storage, and Pipeline caching sections.
In the Database section, choose one of the following options to specify where to store your pipeline metadata and run information:
Select Default database on the cluster to deploy a MariaDB database in your project.
ImportantThe Default database on the cluster option is intended for development and testing purposes only. For production pipeline workloads, select the External MySQL database option to use an external MySQL or MariaDB database.
Select External MySQL database to add a new connection to an external MySQL or MariaDB database that your pipeline server can access.
- In the Host field, enter the database hostname.
- In the Port field, enter the database port.
- In the Username field, enter the default user name that is connected to the database.
- In the Password field, enter the password for the default user account.
- In the Database field, enter the database name.
- Optional: By default, pipeline definitions are stored as Kubernetes resources, enabling version control, GitOps workflows, and integration with OpenShift GitOps or similar tools. To store pipeline definitions in the internal database instead, clear the Store pipeline definitions in Kubernetes checkbox in the Pipeline definition storage section.
- Optional: By default, caching is configurable at both the pipeline and task levels. To disable caching for all pipelines and tasks in the pipeline server and override any pipeline-level and task-level caching settings, clear the Allow caching to be configured per pipeline and task checkbox in the Pipeline caching section.
- Click Configure pipeline server.
Verification
On the Pipelines tab for the project:
- The Import pipeline button is available.
- When you click the action menu (⋮) and then click Manage pipeline server configuration, the pipeline server details are displayed.
1.1.1. Configuring a pipeline server with an external Amazon RDS database Link kopierenLink in die Zwischenablage kopiert!
To configure a pipeline server with an external Amazon Relational Database Service (RDS) database, you must configure OpenShift AI to trust the certificates issued by its certificate authorities (CA).
If you are configuring a pipeline server for production pipeline workloads, Red Hat recommends that you use an external MySQL or MariaDB database.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have logged in to Red Hat OpenShift AI.
- You have created a data science project that you can add a pipeline server to.
- You have an existing S3-compatible object storage bucket, and you have configured your storage account with write access to your S3 bucket.
Procedure
Before configuring your pipeline server, from Amazon RDS: Certificate bundles by AWS Region, download the PEM certificate bundle for the region that the database was created in.
For example, if the database was created in the
us-east-1region, downloadus-east-1-bundle.pem.In a terminal window, log in to the OpenShift cluster where OpenShift AI is deployed.
oc login api.<cluster_name>.<cluster_domain>:6443 --webRun the following command to fetch the current OpenShift AI trusted CA configuration and store it in a new file:
oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crtRun the following command to append the PEM certificate bundle that you downloaded to the new custom CA configuration file:
cat us-east-1-bundle.pem >> /tmp/my-custom-ca-bundles.crtRun the following command to update the OpenShift AI trusted CA configuration to trust certificates issued by the CAs included in the new custom CA configuration file:
oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'- Configure a pipeline server, as described in Configuring a pipeline server.
Verification
- The pipeline server starts successfully.
- You can import and run data science pipelines.
1.2. Defining a pipeline Link kopierenLink in die Zwischenablage kopiert!
The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and data pipelines. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. For more information about compiling pipelines, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK and Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK. Compiling to Kubernetes-native manifests is optional and applies only when your pipeline server is configured to use Kubernetes API storage. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings.
If you are using OpenShift AI on a cluster running in FIPS mode, any custom container images for data science pipelines must be based on UBI 9 or RHEL 9. This ensures compatibility with FIPS-approved pipeline components and prevents errors related to mismatched OpenSSL or GNU C Library (glibc) versions.
You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information about the Elyra JupyterLab extension, see Elyra Documentation.
1.2.1. Compiling the pipeline YAML with the Kubeflow Pipelines SDK Link kopierenLink in die Zwischenablage kopiert!
Before you can define your pipeline in the cluster, you must convert your Python-defined pipeline into YAML format. You can use the Kubeflow Pipelines (KFP) Software Development Kit (SDK) to compile your pipeline code into a deployable YAML file for declarative GitOps deployment.
Prerequisites
- You have installed Python 3.11 or later in your local environment.
-
You have installed the Kubeflow Pipelines SDK package (
kfp) version 2.14.3 or later. - You have a valid Python pipeline definition file.
Procedure
Compile your pipeline by using the KFP SDK to generate the pipeline YAML file.
In the following example, replace <pipeline_file>.py with the name of your Python pipeline file and specify an output file for the compiled YAML:
$ kfp dsl compile \
--py <pipeline_file>.py \
--output <compiled_pipeline_file>.yaml
The generated <compiled_pipeline_file>.yaml file contains the compiled pipeline specification in YAML format. You can use this content as the value of the pipelineSpec field when you create a PipelineVersion custom resource (CR). You can also store the file in Git for declarative or GitOps-based deployment.
Verification
Verify that the generated file includes a pipelineSpec key followed by the compiled pipeline definition:
$ head -n 10 <compiled_pipeline_file>.yaml
Additional resources
1.2.2. Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK Link kopierenLink in die Zwischenablage kopiert!
If your pipeline server uses the Kubernetes native API mode, you can compile your pipeline directly to Kubernetes manifests. The output includes Pipeline and PipelineVersion custom resources with spec.pipelineSpec and, when you use Kubernetes resource configuration, an optional spec.platformSpec.
Prerequisites
- You have installed Python 3.11 or later in your local environment.
-
You have installed the Kubeflow Pipelines SDK package (
kfp) version 2.14.3 or later. - You have a valid Python pipeline definition file.
Procedure
Save the following code as a new file named
compile.pyin your working directory.The example uses the
KubernetesManifestOptionsclass from thekfp.compiler.compiler_utilsmodule to define pipeline metadata such as the name, version, and namespace.Example compile script
from kfp import dsl, compiler from kfp.compiler.compiler_utils import KubernetesManifestOptions @dsl.pipeline(name="<pipeline_name>") def my_pipeline(): pass # define your tasks compiler.Compiler().compile( pipeline_func=my_pipeline, package_path="<output_file>.yaml", kubernetes_manifest_format=True, kubernetes_manifest_options=KubernetesManifestOptions( pipeline_name="<pipeline_name>", pipeline_version_name="<version_name>", namespace="<namespace>", include_pipeline_manifest=True, ), )Run the script to compile your pipeline and generate the Kubernetes manifests:
$ python compile.py
Verification
Verify that the compiled output includes the expected resources:
apiVersion: pipelines.kubeflow.org/v2beta1
kind: Pipeline
---
apiVersion: pipelines.kubeflow.org/v2beta1
kind: PipelineVersion
spec:
pipelineSpec: ...
platformSpec: ... # present when Kubernetes resource configuration is used
Additional resources
1.2.3. Defining a pipeline by using the Kubernetes API Link kopierenLink in die Zwischenablage kopiert!
You can define data science pipelines and pipeline versions by using the Kubernetes API, which stores them as custom resources in the cluster instead of the internal database. This approach makes it easier to use OpenShift GitOps (Argo CD) or similar tools to manage pipelines and pipeline versions, while still allowing you to manage them through the OpenShift AI dashboard, API, and the Kubeflow Pipelines (KFP) Software Development Kit (SDK). You can generate the required manifests by using the Kubeflow Pipelines SDK; see Compiling the pipeline YAML with the Kubeflow Pipelines SDK or Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
If your pipeline server is already configured to use Kubernetes API storage, you can still use the OpenShift AI dashboard and REST API to view pipeline details, run pipelines, and create schedules. In this mode, the Kubernetes API acts as the storage backend, so your existing tools continue to work as expected.
Prerequisites
- You have OpenShift AI administrator privileges or you are the project owner.
- You have a data science project with a running pipeline server.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
If you plan to create a
PipelineVersioncustom resource, you have either:- Compiled your Python pipeline to IR YAML by using the KFP SDK. See Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
- Compiled Kubernetes-native manifests by using the KFP SDK. See Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
Procedure
In a terminal window, log in to your OpenShift cluster by using the OpenShift CLI (
oc):$ oc login -u <user_name>When prompted, enter the OpenShift server URL, connection type, and your password.
To configure the pipeline server to use Kubernetes API storage instead of the default
databaseoption, set thespec.apiServer.pipelineStorefield tokubernetesin your project’sDataSciencePipelinesApplication(DSPA) custom resource.In the following command, replace <dspa_name> with the name of your DSPA custom resource, and replace <namespace> with the name of your project:
$ oc patch dspa <dspa_name> -n <namespace> \ --type=merge \ -p {"spec": {"apiServer": {"pipelineStore": "kubernetes"}}}WarningWhen you switch the pipeline server from database storage to Kubernetes API storage, existing pipelines that were stored in the internal database are no longer visible in the OpenShift AI dashboard or REST API. To view or manage those pipelines again, change the
spec.apiServer.pipelineStorefield back todatabase.Define a
Pipelinecustom resource in a YAML file with the following contents:Example pipeline definition
apiVersion: pipelines.kubeflow.org/v2beta1 kind: Pipeline metadata: name: <name> namespace: <namespace> spec: displayName: <displayName>-
name: The immutable Kubernetes resource name of your pipeline. -
namespace: The name of your project. -
displayName: The user-friendly display name of your pipeline, which is shown in the dashboard and REST API.
-
Apply the pipeline definition to create the
Pipelinecustom resource in your cluster.In the following command, replace <pipeline_yaml_file> with the name of your YAML file:
Example command
$ oc apply -f <pipeline_yaml_file>.yamlAlternatively, if you compiled Kubernetes-native manifests with the KFP SDK, you can apply the generated file directly without manually creating separate YAML files:
$ oc apply -f <output_file>.yamlThe generated file includes both
PipelineandPipelineVersionresources. You can skip the following manual definition steps and proceed to the verification step.Define a
PipelineVersioncustom resource in a YAML file with the following contents:Example pipeline version definition
apiVersion: pipelines.kubeflow.org/v2beta1 kind: PipelineVersion metadata: name: <name> namespace: <namespace> spec: pipelineName: <pipelineName> displayName: <displayName> description: This is the first version of the pipeline. pipelineSpec: # ... YAML generated by compiling Python pipeline with KFP SDK ...-
name: The name of your pipeline version. -
namespace: The name of your project. -
pipelineName: The immutable Kubernetes resource name of your pipeline. This value must match themetadata.namevalue in thePipelinecustom resource. -
displayName: The user-friendly display name of your pipeline version, which is shown in the dashboard and REST API. -
pipelineSpec: The YAML content that you generated by using the Kubeflow Pipelines (KFP) SDK.
-
Apply the pipeline version definition to create the
PipelineVersioncustom resource in your cluster.In the following command, replace <pipeline_version_yaml_file> with the name of your YAML file:
Example command
$ oc apply -f <pipeline_version_yaml_file>.yamlAfter creating the pipeline version, the system automatically applies the following labels to the pipeline version for easier filtering:
Example automatic labels
pipelines.kubeflow.org/pipeline-id: <metadata.uid of the pipeline> pipelines.kubeflow.org/pipeline: <pipeline name>
Verification
Check that the
Pipelinecustom resource was successfully created:$ oc get pipeline <pipeline_name> -n <namespace>Check that the
PipelineVersioncustom resource was successfully created:$ oc get pipelineversion <pipeline_version_name> -n <namespace>
1.2.4. Migrating pipelines from database to Kubernetes API storage Link kopierenLink in die Zwischenablage kopiert!
You can migrate existing pipelines and pipeline versions from the internal database to Kubernetes custom resources. This makes it easier to use OpenShift GitOps (Argo CD) or similar tools to manage pipelines and pipeline versions, while still allowing you to manage them through the OpenShift AI dashboard, API, and the Kubeflow Pipelines (KFP) Software Development Kit (SDK).
This procedure uses a community-supported Kubeflow Pipelines migration script to export pipelines from the Data Science Pipelines API and generate corresponding Pipeline and PipelineVersion custom resources for import into your cluster.
The migration script in this procedure is maintained by the Kubeflow Pipelines community and is not supported by Red Hat. Before you use the script, review the repository and validate it in a non-production environment.
The pipeline and pipeline version IDs change during migration, so existing pipeline runs do not map to the migrated pipeline version. The original ID is stored in the pipelines.kubeflow.org/original-id label.
Prerequisites
- You have OpenShift AI administrator privileges or you are the project owner.
- You have a data science project with a running pipeline server.
-
The pipeline server is configured with
spec.apiServer.pipelineStore: database. - You have Python 3.11 installed in your local environment.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
In a terminal window, log in to your OpenShift cluster by using the OpenShift CLI (
oc):$ oc login -u <user_name>When prompted, enter the OpenShift server URL, connection type, and your password.
Set environment variables for your data science project and get the pipeline API route.
In the
exportcommand, replace <namespace> with the name of your project:echo "Setting the prerequisite variables" export NAMESPACE=<namespace> export DSPA_NAME=$(oc -n $NAMESPACE get dspa -o jsonpath={.items[0].metadata.name}) export API_URL="https://$(oc -n $NAMESPACE get route "ds-pipeline-$DSPA_NAME" -o jsonpath={.spec.host})"Create a Python virtual environment and install the required dependencies.
echo "Set up the Python prerequisites" python3.11 -m venv .venv ./.venv/bin/pip install kfp requests PyYAMLDownload and run the Kubeflow Pipelines community migration script.
The script connects to the Data Science Pipelines API, exports all pipelines and versions from the specified data science project, and generates one YAML file per pipeline in a local
kfp-exported-pipelines/directory. Each file includes aPipelineresource followed by all associatedPipelineVersionresources.Run the following command:
curl -L https://raw.githubusercontent.com/kubeflow/pipelines/refs/heads/master/tools/k8s-native/migration.py -o migration.py ./.venv/bin/python migration.py --skip-tls-verify --kfp-server-host $API_URL --namespace $NAMESPACE --token "$(oc whoami --show-token)"NoteThe
--skip-tls-verifyoption disables certificate validation and should be used only in development environments or when connecting to a server with a self-signed certificate. In production environments, provide a valid certificate bundle instead.Additionally, passing the access token directly on the command line might expose it in shell history or process lists. To reduce this risk, store the token in an environment variable and reference it in your command:
export KFP_TOKEN=$(oc whoami --show-token) ./.venv/bin/python migration.py --kfp-server-host $API_URL --namespace $NAMESPACE --token "$KFP_TOKEN"Alternatively, use a prompt with
read -sto input the token securely at runtime.Optional: For more information about the script, run the following command:
./.venv/bin/python migration.py --help-
If you plan to create new or updated
PipelineVersioncustom resources after migration, you can compile your pipeline code by using the Kubeflow Pipelines SDK. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK and Compiling Kubernetes-native manifests with the Kubeflow Pipelines SDK.
Apply the exported Kubernetes custom resources to your cluster.
oc apply -f ./kfp-exported-pipelinesChange the pipeline server to use Kubernetes API storage.
oc -n "$NAMESPACE" patch dspa "$DSPA_NAME" --type=merge -p {"spec":{"apiServer":{"pipelineStore":"kubernetes"}}}NoteTo view pipelines that were stored in the internal database and not migrated, you can temporarily change the pipeline server back to
databasestorage.oc -n $NAMESPACE patch dspa $DSPA_NAME --type=merge -p {"spec":{"apiServer":{"pipelineStore":"database"}}}-
Repeat this procedure for each additional data science project that you want to migrate, changing
NAMESPACEto the appropriate project name. Optional: Clean up the local environment.
rm -rf .venv migration.py
Verification
Check that the
PipelineandPipelineVersioncustom resources were created in your project:$ oc -n <namespace> get pipelines.pipelines.kubeflow.org $ oc -n <namespace> get pipelineversions.pipelines.kubeflow.orgVerify that the pipeline server is using Kubernetes API storage:
$ oc -n <namespace> get dspa <dspa_name> -o jsonpath={.spec.apiServer.pipelineStore}{"\n"}The command should return
kubernetes.
Additional resources
1.3. Importing a data science pipeline Link kopierenLink in die Zwischenablage kopiert!
To help you begin working with data science pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a configured pipeline server.
- You have compiled your pipeline with the Kubeflow compiler and you have access to the resulting YAML file.
- If you are uploading your pipeline from a URL, the URL is publicly accessible.
If your pipeline is defined in Python code instead of a YAML file, compile it first by using the KFP SDK. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that you want to import a pipeline to.
- Click Import pipeline.
In the Import pipeline dialog, enter the details for the pipeline that you want to import.
- In the Pipeline name field, enter a name for the pipeline that you want to import.
- In the Pipeline description field, enter a description for the pipeline that want to import.
Select where you want to import your pipeline from by performing one of the following actions:
- Select Upload a file to upload your pipeline from your local machine’s file system. Import your pipeline by clicking Upload, or by dragging and dropping a file.
- Select Import by url to upload your pipeline from a URL, and then enter the URL into the text box.
- Click Import pipeline.
Verification
- The pipeline that you imported is displayed on the Pipelines page and on the Pipelines tab on the project details page.
1.4. Deleting a data science pipeline Link kopierenLink in die Zwischenablage kopiert!
If you no longer require access to your data science pipeline on the dashboard, you can delete it so that it does not appear on the Data science pipelines page.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- There are active pipelines available on the Pipelines page.
- The pipeline that you want to delete does not contain any pipeline versions.
- The pipeline that you want to delete does not contain any pipeline versions. For more information, see Deleting a pipeline version.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline that you want to delete.
- Click the action menu (⋮) beside the pipeline that you want to delete, and then click Delete pipeline.
- In the Delete pipeline dialog, enter the pipeline name in the text field to confirm that you intend to delete it.
- Click Delete pipeline.
Verification
- The data science pipeline that you deleted is no longer displayed on the Pipelines page.
1.5. Deleting a pipeline server Link kopierenLink in die Zwischenablage kopiert!
After you have finished running your data science pipelines, you can delete the pipeline server. Deleting a pipeline server automatically deletes all of its associated pipelines, pipeline versions, and runs. If your pipeline data is stored in a database, the database is also deleted along with its meta-data. In addition, after deleting a pipeline server, you cannot create new pipelines or pipeline runs until you create another pipeline server.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline server that you want to delete.
- From the Pipeline server actions list, select Delete pipeline server.
- In the Delete pipeline server dialog, enter the name of the pipeline server in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- Pipelines previously assigned to the deleted pipeline server no longer appear on the Pipelines page for the relevant data science project.
- Pipeline runs previously assigned to the deleted pipeline server no longer appear on the Runs page for the relevant data science project.
1.6. Viewing the details of a pipeline server Link kopierenLink in die Zwischenablage kopiert!
You can view the details of pipeline servers configured in OpenShift AI, such as the pipeline’s connection details and where its data is stored.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that contains an active and available pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline server that you want to view.
- From the Pipeline server actions list, select Manage pipeline server configuration.
Verification
- You can view the pipeline server details in the Manage pipeline server dialog.
1.7. Viewing existing pipelines Link kopierenLink in die Zwischenablage kopiert!
You can view the details of pipelines that you have imported to Red Hat OpenShift AI, such as the pipeline’s last run, when it was created, the pipeline’s executed runs, and details of any associated pipeline versions.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- Existing pipelines are available.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the pipelines that you want to view.
-
Optional: Click Expand (
) on the row of a pipeline to view its pipeline versions.
Verification
- A list of data science pipelines is displayed on the Pipelines page.
1.8. Overview of pipeline versions Link kopierenLink in die Zwischenablage kopiert!
You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes. You can track and manage your changes on the OpenShift AI dashboard, allowing you to schedule and execute runs against all available versions of your pipeline.
1.9. Uploading a pipeline version Link kopierenLink in die Zwischenablage kopiert!
You can upload a YAML file to an active pipeline server that contains the latest version of your pipeline, or you can upload the YAML file from a URL. The YAML file must consist of a Kubeflow pipeline compiled by using the Kubeflow compiler. After you upload a pipeline version to a pipeline server, you can execute it by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a configured pipeline server.
- You have a pipeline version available and ready to upload.
- If you are uploading your pipeline version from a URL, the URL is publicly accessible.
- If your pipeline version is based on Python code, compile it to YAML before uploading. For more information, see Compiling the pipeline YAML with the Kubeflow Pipelines SDK.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that you want to upload a pipeline version to.
- Click the Import pipeline drop-down list, and then select Upload new version.
In the Upload new version dialog, enter the details for the pipeline version that you are uploading.
- From the Pipeline list, select the pipeline that you want to upload your pipeline version to.
- In the Pipeline version name field, confirm the name for the pipeline version, and change it if necessary.
- In the Pipeline version description field, enter a description for the pipeline version.
Select where you want to upload your pipeline version from by performing one of the following actions:
- Select Upload a file to upload your pipeline version from your local machine’s file system. Import your pipeline version by clicking Upload, or by dragging and dropping a file.
- Select Import by url to upload your pipeline version from a URL, and then enter the URL into the text box.
- Click Upload.
Verification
-
The pipeline version that you uploaded is displayed on the Pipelines page. Click Expand (
) on the row containing the pipeline to view its versions.
- The Version column on the row containing the pipeline version that you uploaded on the Pipelines page increments by one.
1.10. Deleting a pipeline version Link kopierenLink in die Zwischenablage kopiert!
You can delete specific versions of a pipeline when you no longer require them. Deleting a default pipeline version automatically changes the default pipeline version to the next most recent version. If no pipeline versions exist, the pipeline persists without a default version.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. The Pipelines page opens.
Delete the pipeline versions that you no longer require:
To delete a single pipeline version:
- From the Project list, select the project that contains a version of a pipeline that you want to delete.
-
On the row containing the pipeline, click Expand (
).
Click the action menu (⋮) beside the project version that you want to delete, and then click Delete pipeline version.
The Delete pipeline version dialog opens.
- Enter the name of the pipeline version in the text field to confirm that you intend to delete it.
- Click Delete.
To delete multiple pipeline versions:
- On the row containing each pipeline version that you want to delete, select the checkbox.
- Click the action menu (⋮) next to the Import pipeline drop-down list, and then select Delete from the list.
Verification
- The pipeline version that you deleted is no longer displayed on the Pipelines page, or on the Pipelines tab for the data science project.
1.11. Viewing the details of a pipeline version Link kopierenLink in die Zwischenablage kopiert!
You can view the details of a pipeline version that you have uploaded to Red Hat OpenShift AI, such as its graph and YAML code.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a pipeline server.
- You have a pipeline available on an active and available pipeline server.
Procedure
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. The Pipelines page opens.
- From the Project drop-down list, select the project that contains the pipeline versions that you want to view details for.
Click the pipeline name to view further details of its most recent version. The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.
Alternatively, click Expand (
) on the row containing the pipeline that you want to view versions for, and then click the pipeline version that you want to view the details of. The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.
Verification
- On the pipeline version details page, you can view the pipeline graph, summary details, and YAML code.
1.12. Downloading a data science pipeline version Link kopierenLink in die Zwischenablage kopiert!
To make further changes to a data science pipeline version that you previously uploaded to OpenShift AI, you can download pipeline version code from the user interface.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that is available and contains a configured pipeline server.
- You have created and imported a pipeline to an active pipeline server that is available to download.
Procedure
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the version that you want to download.
-
Click Expand (
) beside the pipeline that contains the version that you want to download.
Click the pipeline version that you want to download.
The pipeline version details page opens.
-
Click the Pipeline spec tab, and then click the Download button (
) to download the YAML file that contains the pipeline version code to your local machine.
Verification
- The pipeline version code downloads to your browser’s default directory for downloaded files.
1.13. Overview of data science pipelines caching Link kopierenLink in die Zwischenablage kopiert!
You can use caching within data science pipelines to optimize execution times and improve resource efficiency. Caching reduces redundant task execution by reusing results from previous runs with identical inputs.
Caching is particularly beneficial for iterative tasks, where intermediate steps might not need to be repeated. Understanding caching can help you design more efficient pipelines and save time in model development.
Caching operates by storing the outputs of successfully completed tasks and comparing the inputs of new tasks against previously cached ones. If a match is found, OpenShift AI reuses the cached results instead of re-executing the task, reducing computation time and resource usage.
1.13.1. Caching criteria Link kopierenLink in die Zwischenablage kopiert!
For caching to be effective, the following criteria determine if a task can use previously cached results:
- Input data and parameters: If the input data and parameters for a task are unchanged from a previous run, cached results are eligible for reuse.
- Task code and configuration: Changes to the task code or configurations invalidate the cache to ensure that modifications are always reflected.
- Pipeline environment: Changes to the pipeline environment, such as dependency versions, also affect caching eligibility to maintain consistency.
1.13.2. Viewing cached steps in the OpenShift AI user interface Link kopierenLink in die Zwischenablage kopiert!
Cached steps in pipelines are visually indicated in the user interface (UI):
-
Tasks that use cached results display a green icon, helping you quickly identify which steps were cached. The Status field in the side panel displays
Cachedfor cached tasks. - The UI also includes information about when the task was previously executed, allowing for easy verification of cache usage.
To check the caching status of specific tasks, navigate to the pipeline details view in the UI. Cached and non-cached tasks are clearly indicated. Cached tasks do not display execution logs because they reuse previously generated outputs and are not re-executed.
1.13.3. Controlling caching in data science pipelines Link kopierenLink in die Zwischenablage kopiert!
Caching is enabled by default in OpenShift AI to improve performance. However, there are instances when disabling caching might be necessary for specific tasks, an entire pipeline, or all pipelines. For example, caching might not be beneficial for tasks that rely on frequently updated data or unique computational needs. In other cases, such as debugging, development, or when deterministic re-execution is required, you might want to disable caching for all pipelines.
Disabling caching at the pipeline or pipeline server level causes all tasks to re-run, potentially increasing compute time and resource usage.
You can control caching for data science pipelines in the following ways:
- Individual task: Data scientists can disable caching for specific steps in a pipeline.
- Pipeline (submit time): Data scientists can disable caching when submitting a pipeline run.
- Pipeline (compile time): Data scientists can disable caching when compiling a pipeline.
- All pipelines (pipeline server): You can disable caching for all pipelines in the pipeline server, which overrides all pipeline and task-level caching settings.
1.13.3.1. Disabling caching for individual tasks Link kopierenLink in die Zwischenablage kopiert!
To disable caching for a particular task, apply the set_caching_options method directly to the task in your pipeline code:
task_name.set_caching_options(False)
After applying this setting, OpenShift AI runs the task in future pipeline runs, ignoring any cached results.
You can re-enable caching for individual tasks by setting the set_caching_options parameter to True or by omitting set_caching_options.
This setting is ignored if caching is disabled in the pipeline server.
1.13.3.2. Disabling caching for a pipeline at submit time Link kopierenLink in die Zwischenablage kopiert!
To disable caching for the entire pipeline during pipeline submission, set the enable_caching parameter to False in your pipeline code. This setting ensures that no steps are cached during pipeline execution. The enable_caching parameter is available only when using the kfp.client to submit pipelines or start pipeline runs, such as the run_pipeline method.
Example:
import kfp
client = kfp.Client()
client.run_pipeline(
experiment_id=experiment.id,
pipeline_id=pipeline.id,
job_name="no-cache-run",
params={}, # optional
enable_caching=False,
)
This setting is ignored if caching is disabled during pipeline compilation or in the pipeline server.
1.13.3.3. Disabling caching for a pipeline at compile time Link kopierenLink in die Zwischenablage kopiert!
To disable caching for the entire pipeline during compilation, set one of the following options in your local environment or workbench:
Environment variable:
export KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT=trueCLI flag (when using
kfp dsl compile):kfp dsl compile --disable-execution-caching-by-default
These settings are ignored if caching is disabled in the pipeline server.
1.13.3.4. Disabling caching for all pipelines (pipeline server) Link kopierenLink in die Zwischenablage kopiert!
To disable caching for all pipelines in the pipeline server and override all pipeline and task-level caching settings, use either of the following methods:
- Pipeline server configuration
-
From the OpenShift AI dashboard, click Data science pipelines
Pipelines. - On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline server that you want to configure.
- From the Pipeline server actions list, select Manage pipeline server configuration.
- In the Pipeline caching section, clear the Allow caching to be configured per pipeline and task checkbox.
- Click Save.
-
From the OpenShift AI dashboard, click Data science pipelines
- Data Science Pipelines Application (cluster administrator)
In the OpenShift console or CLI, set the
cacheEnabledfield tofalsein theDataSciencePipelinesApplication(DSPA) custom resource for the project.Example:
apiVersion: datasciencepipelinesapplications.opendatahub.io/v1 kind: DataSciencePipelinesApplication metadata: name: my-dspa namespace: my-namespace spec: apiServer: cacheEnabled: falseTo allow caching to be configured at the pipeline and task level, set the
cacheEnabledfield totruein the DSPA custom resource.
After applying this setting, all pipeline and task-level caching settings are ignored.
Changing this setting updates the CACHEENABLED environment variable in the pipeline server deployment.
Verification
After configuring caching settings, you can verify its behavior by using one of the following methods:
- Check the UI: Locate the green icons in the task list to identify cached steps.
- Test task re-runs: Disable caching on specific tasks or the pipeline to confirm that steps re-execute as expected.
- Validate inputs: Ensure the task inputs, parameters, and runtime settings are unchanged when caching is applied.
You can also disable caching for a single node or for your entire pipeline in JupyterLab using Elyra. For more information, see Disabling node caching in Elyra.