Este contenido no está disponible en el idioma seleccionado.

Working with data science pipelines

Red Hat OpenShift AI Self-Managed 2.16

Work with data science pipelines from Red Hat OpenShift AI Self-Managed

Abstract

Enhance your data science projects on OpenShift AI by building portable machine learning (ML) workflows with data science pipelines.

Preface
Copiar enlace

As a data scientist, you can enhance your data science projects on OpenShift AI by building portable machine learning (ML) workflows with data science pipelines, using Docker containers. This enables you to standardize and automate machine learning workflows to enable you to develop and deploy your data science models.

For example, the steps in a machine learning workflow might include items such as data extraction, data processing, feature extraction, model training, model validation, and model serving. Automating these activities enables your organization to develop a continuous process of retraining and updating a model based on newly received data. This can help address challenges related to building an integrated machine learning deployment and continuously operating it in production.

You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information, see Working with pipelines in JupyterLab.

From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. For more information, see Migrating to data science pipelines 2.0.

To use a data science pipeline in OpenShift AI, you need the following components:

Pipeline server: A server that is attached to your data science project and hosts your data science pipeline.
Pipeline: A pipeline defines the configuration of your machine learning workflow and the relationship between each component in the workflow.
- Pipeline code: A definition of your pipeline in a YAML file.
- Pipeline graph: A graphical illustration of the steps executed in a pipeline run and the relationship between them.
Pipeline experiment: A workspace where you can try different configurations of your pipelines. You can use experiments to organize your runs into logical groups.
- Archived pipeline experiment: An archived pipeline experiment.
- Pipeline artifact: An output artifact produced by a pipeline component.
- Pipeline execution: The execution of a task in a pipeline.
Pipeline run: An execution of your pipeline.
- Active run: A pipeline run that is executing, or stopped.
- Scheduled run: A pipeline run that is scheduled to execute at least once.
- Archived run: An archived pipeline run.

This feature is based on Kubeflow Pipelines 2.0. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. The OpenShift AI user interface enables you to track and manage pipelines and pipeline runs. You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes.

You can store your pipeline artifacts in an S3-compatible object storage bucket so that you do not consume local storage. To do this, you must first configure write access to your S3 bucket on your storage account.

Chapter 1. Managing data science pipelines
Copiar enlace

1.1. Configuring a pipeline server
Copiar enlace

Before you can successfully create a pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored.

Note

You are not required to specify any storage directories when configuring a connection for your pipeline server. When you import a pipeline, the /pipelines folder is created in the root folder of the bucket, containing a YAML file for the pipeline. If you upload a new version of the same pipeline, a new YAML file with a different ID is added to the /pipelines folder.

When you run a pipeline, the artifacts are stored in the /pipeline-name folder in the root folder of the bucket.

Important

If you use an external MySQL database and upgrade to OpenShift AI 2.9 or later, the database is migrated to data science pipelines 2.0 format, making it incompatible with earlier versions of OpenShift AI.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have created a data science project that you can add a pipeline server to.
You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account.
If you are configuring a pipeline server for production pipeline workloads, you have an existing external MySQL or MariaDB database.
If you are configuring a pipeline server with an external MySQL database, your database must use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
Note
The mysql_native_password authentication plugin is required for the ML Metadata component to successfully connect to your database. mysql_native_password is disabled by default in MySQL 8.4 and later. If your database uses MySQL 8.4 or later, you must update your MySQL deployment to enable the mysql_native_password plugin.
For more information about enabling the mysql_native_password plugin, see Native Pluggable Authentication in the MySQL documentation.
If you are configuring a pipeline server with a MariaDB database, your database must use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.

Procedure

From the OpenShift AI dashboard, click Data Science Projects.
The Data Science Projects page opens.
Click the name of the project that you want to configure a pipeline server for.
A project details page opens.
Click the Pipelines tab.
Click Configure pipeline server.
The Configure pipeline server dialog appears.
In the Object storage connection section, provide values for the mandatory fields:
1. In the Access key field, enter the access key ID for the S3-compatible object storage provider.
2. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
3. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
4. In the Region field, enter the default region of your S3-compatible object storage account.
5. In the Bucket field, enter the name of your S3-compatible object storage bucket.
  Important
  If you specify incorrect connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.
  If you want to use an existing artifact that was not generated by a task in a pipeline, you can use the kfp.dsl.importer component to import the artifact from its URI. You can only import these artifacts to the S3-compatible object storage bucket that you define in the Bucket field in your pipeline server configuration. For more information about the kfp.dsl.importer component, see Special Case: Importer Components.
In the Database section, click Show advanced database options to specify the database to store your pipeline data and select one of the following sets of actions:
- Select Use default database stored on your cluster to deploy a MariaDB database in your project.
  Important
  The Use default database stored on your cluster option is intended for development and testing purposes only. For production pipeline workloads, select the Connect to external MySQL database option to use an external MySQL or MariaDB database.
- Select Connect to external MySQL database to add a new connection to an external MySQL or MariaDB database that your pipeline server can access.
  1. In the Host field, enter the database’s host name.
  2. In the Port field, enter the database’s port.
  3. In the Username field, enter the default user name that is connected to the database.
  4. In the Password field, enter the password for the default user account.
  5. In the Database field, enter the database name.
Click Configure pipeline server.

Verification

On the Pipelines tab for the project:

The Import pipeline button is available.
When you click the action menu (⋮) and then click View pipeline server configuration, the pipeline server details are displayed.

1.1.1. Configuring a pipeline server with an external Amazon RDS database
Copiar enlace

To configure a pipeline server with an external Amazon Relational Database Service (RDS) database, you must configure OpenShift AI to trust the certificates issued by its certificate authorities (CA).

Important

If you are configuring a pipeline server for production pipeline workloads, Red Hat recommends that you use an external MySQL or MariaDB database.

Prerequisites

You have cluster administrator privileges for your OpenShift cluster.
You have logged in to Red Hat OpenShift AI.
You have created a data science project that you can add a pipeline server to.
You have an existing S3-compatible object storage bucket, and you have configured your storage account with write access to your S3 bucket.

Procedure

Before configuring your pipeline server, from Amazon RDS: Certificate bundles by AWS Region, download the PEM certificate bundle for the region that the database was created in.
For example, if the database was created in the us-east-1 region, download us-east-1-bundle.pem.
In a terminal window, log in to the OpenShift cluster where OpenShift AI is deployed.
```
oc login api.<cluster_name>.<cluster_domain>:6443 --web
```

Run the following command to fetch the current OpenShift AI trusted CA configuration and store it in a new file:

oc get dscinitializations.dscinitialization.opendatahub.io default-dsci -o json | jq '.spec.trustedCABundle.customCABundle' > /tmp/my-custom-ca-bundles.crt

Run the following command to append the PEM certificate bundle that you downloaded to the new custom CA configuration file:
```
cat us-east-1-bundle.pem >> /tmp/my-custom-ca-bundles.crt
```

Run the following command to update the OpenShift AI trusted CA configuration to trust certificates issued by the CAs included in the new custom CA configuration file:

oc patch dscinitialization default-dsci --type='json' -p='[{"op":"replace","path":"/spec/trustedCABundle/customCABundle","value":"'"$(awk '{printf "%s\\n", $0}' /tmp/my-custom-ca-bundles.crt)"'"}]'

Configure a pipeline server, as described in Configuring a pipeline server.

Verification

The pipeline server starts successfully.
You can import and run data science pipelines.

1.2. Defining a pipeline
Copiar enlace

The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and data pipelines. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings.

You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information about the Elyra JupyterLab extension, see Elyra Documentation.

1.3. Importing a data science pipeline
Copiar enlace

To help you begin working with data science pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have compiled your pipeline with the Kubeflow compiler and you have access to the resulting YAML file.
If you are uploading your pipeline from a URL, the URL is publicly accessible.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that you want to import a pipeline to.
Click Import pipeline.
In the Import pipeline dialog, enter the details for the pipeline that you want to import.
1. In the Pipeline name field, enter a name for the pipeline that you want to import.
2. In the Pipeline description field, enter a description for the pipeline that want to import.
3. Select where you want to import your pipeline from by performing one of the following actions:
  - Select Upload a file to upload your pipeline from your local machine’s file system. Import your pipeline by clicking Upload, or by dragging and dropping a file.
  - Select Import by url to upload your pipeline from a URL, and then enter the URL into the text box.
4. Click Import pipeline.

Verification

The pipeline that you imported appears on the Pipelines page and on the Pipelines tab on the project details page.

1.4. Deleting a data science pipeline
Copiar enlace

If you no longer require access to your data science pipeline on the dashboard, you can delete it so that it does not appear on the Data Science Pipelines page.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
There are active pipelines available on the Pipelines page.
The pipeline that you want to delete does not contain any pipeline versions.
The pipeline that you want to delete does not contain any pipeline versions. For more information, see Deleting a pipeline version.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline that you want to delete.
Click the action menu (⋮) beside the pipeline that you want to delete, and then click Delete pipeline.
In the Delete pipeline dialog, enter the pipeline name in the text field to confirm that you intend to delete it.
Click Delete pipeline.

Verification

The data science pipeline that you deleted no longer appears on the Pipelines page.

1.5. Deleting a pipeline server
Copiar enlace

After you have finished running your data science pipelines, you can delete the pipeline server. Deleting a pipeline server automatically deletes all of its associated pipelines, pipeline versions, and runs. If your pipeline data is stored in a database, the database is also deleted along with its meta-data. In addition, after deleting a pipeline server, you cannot create new pipelines or pipeline runs until you create another pipeline server.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline server that you want to delete.
From the Pipeline server actions list, select Delete pipeline server.
In the Delete pipeline server dialog, enter the name of the pipeline server in the text field to confirm that you intend to delete it.
Click Delete.

Verification

Pipelines previously assigned to the deleted pipeline server no longer appear on the Pipelines page for the relevant data science project.
Pipeline runs previously assigned to the deleted pipeline server no longer appear on the Runs page for the relevant data science project.

1.6. Viewing the details of a pipeline server
Copiar enlace

You can view the details of pipeline servers configured in OpenShift AI, such as the pipeline’s connection details and where its data is stored.

Prerequisites

You have logged in to Red Hat OpenShift AI.
You have previously created a data science project that contains an active and available pipeline server.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline server that you want to view.
From the Pipeline server actions list, select View pipeline server configuration.

Verification

You can view the pipeline server details in the View pipeline server dialog.

1.7. Viewing existing pipelines
Copiar enlace

You can view the details of pipelines that you have imported to Red Hat OpenShift AI, such as the pipeline’s last run, when it was created, the pipeline’s executed runs, and details of any associated pipeline versions.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
Existing pipelines are available.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipelines that you want to view.
Optional: Click Expand ( ) on the row of a pipeline to view its pipeline versions.

Verification

A list of data science pipelines appears on the Pipelines page.

1.8. Overview of pipeline versions
Copiar enlace

You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes. You can track and manage your changes on the OpenShift AI dashboard, allowing you to schedule and execute runs against all available versions of your pipeline.

1.9. Uploading a pipeline version
Copiar enlace

You can upload a YAML file to an active pipeline server that contains the latest version of your pipeline, or you can upload the YAML file from a URL. The YAML file must consist of a Kubeflow pipeline compiled by using the Kubeflow compiler. After you upload a pipeline version to a pipeline server, you can execute it by creating a pipeline run.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have a pipeline version available and ready to upload.
If you are uploading your pipeline version from a URL, the URL is publicly accessible.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that you want to upload a pipeline version to.
Click the Import pipeline drop-down list, and then select Upload new version.
In the Upload new version dialog, enter the details for the pipeline version that you are uploading.
1. From the Pipeline list, select the pipeline that you want to upload your pipeline version to.
2. In the Pipeline version name field, confirm the name for the pipeline version, and change it if necessary.
3. In the Pipeline version description field, enter a description for the pipeline version.
4. Select where you want to upload your pipeline version from by performing one of the following actions:
  - Select Upload a file to upload your pipeline version from your local machine’s file system. Import your pipeline version by clicking Upload, or by dragging and dropping a file.
  - Select Import by url to upload your pipeline version from a URL, and then enter the URL into the text box.
5. Click Upload.

Verification

The pipeline version that you uploaded is displayed on the Pipelines page. Click Expand ( ) on the row containing the pipeline to view its versions.
The Version column on the row containing the pipeline version that you uploaded on the Pipelines page increments by one.

1.10. Deleting a pipeline version
Copiar enlace

You can delete specific versions of a pipeline when you no longer require them. Deleting a default pipeline version automatically changes the default pipeline version to the next most recent version. If no pipeline versions exist, the pipeline persists without a default version.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
The Pipelines page opens.
Delete the pipeline versions that you no longer require:
- To delete a single pipeline version:
  1. From the Project list, select the project that contains a version of a pipeline that you want to delete.
  2. On the row containing the pipeline, click Expand ( ).
  3. Click the action menu (⋮) beside the project version that you want to delete, and then click Delete pipeline version.
    The Delete pipeline version dialog opens.
  4. Enter the name of the pipeline version in the text field to confirm that you intend to delete it.
  5. Click Delete.
- To delete multiple pipeline versions:
  1. On the row containing each pipeline version that you want to delete, select the checkbox.
  2. Click the action menu (⋮) next to the Import pipeline drop-down list, and then select Delete from the list.

Verification

The pipeline version that you deleted no longer appears on the Pipelines page, or on the Pipelines tab for the data science project.

1.11. Viewing the details of a pipeline version
Copiar enlace

You can view the details of a pipeline version that you have uploaded to Red Hat OpenShift AI, such as its graph and YAML code.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have a pipeline available on an active and available pipeline server.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
The Pipelines page opens.
From the Project drop-down list, select the project that contains the pipeline versions that you want to view details for.
Click the project name to view further details of its most recent version.
The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.
Alternatively, click Expand ( ) on the row containing the pipeline that you want to view versions for, and then click the pipeline version that you want to view the details of.
The pipeline version details page opens, displaying the Graph, Summary, and Pipeline spec tabs.

Verification

On the pipeline version details page, you can view the pipeline graph, summary details, and YAML code.

1.12. Downloading a data science pipeline version
Copiar enlace

To make further changes to a data science pipeline version that you previously uploaded to OpenShift AI, you can download pipeline version code from the user interface.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have created and imported a pipeline to an active pipeline server that is available to download.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the version that you want to download.
Click Expand ( ) beside the pipeline that contains the version that you want to download.
Click the pipeline version that you want to download.
The pipeline version details page opens.
On the Pipeline spec tab, click the Download button ( ) to download the YAML file that contains the pipeline version code to your local machine.

Verification

The pipeline version code downloads to your browser’s default directory for downloaded files.

1.13. Overview of data science pipelines caching
Copiar enlace

OpenShift AI supports caching within data science pipelines to optimize execution times and improve resource efficiency. Using caching reduces redundant task execution by reusing results from previous runs with identical inputs.

Caching is particularly beneficial for iterative tasks, where intermediate steps might not need to be repeated. Understanding caching can help you design more efficient pipelines and save time in model development.

Caching operates by storing the outputs of successfully completed tasks and comparing the inputs of new tasks against previously cached ones. If a match is found, OpenShift AI reuses the cached results instead of re-executing the task, reducing computation time and resource usage.

1.13.1. Caching criteria
Copiar enlace

For caching to be effective, the following criteria determines if a task can use previously cached results:

Input data and parameters: If the input data and parameters for a task are unchanged from a previous run, cached results are eligible for reuse.
Task code and configuration: Changes to the task code or configurations invalidate the cache to ensure that modifications are always reflected.
Pipeline environment: Changes to the pipeline environment, such as dependency versions, also affect caching eligibility to maintain consistency.

1.13.2. Viewing cached steps in the OpenShift AI user interface
Copiar enlace

Cached steps in pipelines are visually indicated in the user interface (UI):

Tasks that use cached results display a green icon, helping you quickly identify which steps were cached. The Status field in the side panel displays Cached for cached tasks.
The UI also includes information about when the task was previously executed, allowing for easy verification of cache usage.

To confirm caching status for specific tasks, navigate to the pipeline details view in the UI, where all cached and non-cached tasks are indicated. When a pipeline task is cached, its execution logs are not available. This is because the task uses previously generated outputs, eliminating the need for re-execution.

1.13.3. Disabling caching for specific tasks or pipelines
Copiar enlace

In OpenShift AI, caching is enabled by default, but there are situations where disabling caching for specific tasks or the entire pipeline is necessary. For example, tasks that rely on frequently updated data or unique computational needs might not benefit from caching.

1.13.3.1. Disabling caching for individual tasks
Copiar enlace

To disable caching for a particular task, apply the set_caching_options method directly to the task in your pipeline code:

task_name.set_caching_options(False)

After applying this setting, OpenShift AI executes the task in all future pipeline runs, ignoring any cached results.

Note

You can re-enable caching for individual tasks by setting set_caching_options(True).

1.13.3.2. Disabling caching for pipelines
Copiar enlace

If necessary, you can disable caching for the entire pipeline during pipeline submission by setting the enable_caching parameter to False in your pipeline code. This setting ensures that no steps are cached during pipeline execution. The enable_caching parameter is available only when using the kfp.client to submit pipelines or start pipeline runs, such as the run_pipeline method.

Example:

pipeline_func(enable_caching=False)

Important

When disabling caching at the pipeline level, all tasks are re-executed, potentially increasing compute time and resource usage.

1.13.4. Verification and troubleshooting
Copiar enlace

After configuring caching settings, you can verify that caching behaves as expected by using one of the following methods:

Checking the UI: Confirm cached steps by locating the steps with the green icon in the task list.
Testing task re-runs: Disable caching on individual tasks or the pipeline and check for re-execution to verify cache bypassing.
Validating inputs: Ensure the task inputs, parameters, and environment remain unchanged if caching applies.

Chapter 2. Managing pipeline experiments
Copiar enlace

2.1. Overview of pipeline experiments
Copiar enlace

A pipeline experiment is a workspace where you can try different configurations of your pipelines. You can use experiments to organize your runs into logical groups. As a data scientist, you can use OpenShift AI to define, manage, and track pipeline experiments. You can view a record of previously created and archived experiments from the Experiments page in the OpenShift AI user interface. Pipeline experiments contain pipeline runs, including recurring runs. This allows you to try different configurations of your pipelines.

When you work with data science pipelines, it is important to monitor and record your pipeline experiments to track the performance of your data science pipelines. You can compare the results of up to 10 pipeline runs at one time, and view available parameter, scalar metric, confusion matrix, and receiver operating characteristic (ROC) curve data for all selected runs.

You can view artifacts for an executed pipeline run from the OpenShift AI dashboard. Pipeline artifacts can help you to evaluate the performance of your pipeline runs and make it easier to understand your pipeline components. Pipeline artifacts can range from plain text data to detailed, interactive data visualizations.

2.2. Creating a pipeline experiment
Copiar enlace

Pipeline experiments are workspaces where you can try different configurations of your pipelines. You can also use experiments to organize your pipeline runs into logical groups. Pipeline experiments contain pipeline runs, including recurring runs.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project to create the pipeline experiment in.
Click Create experiment.
In the Create experiment dialog, configure the pipeline experiment:
1. In the Experiment name field, enter a name for the pipeline experiment.
2. In the Description field, enter a description for the pipeline experiment.
3. Click Create experiment.

Verification

The pipeline experiment that you created appears on the Experiments tab.

2.3. Archiving a pipeline experiment
Copiar enlace

You can retain records of your pipeline experiments by archiving them. If required, you can restore pipeline experiments from your archive to reuse, or delete pipeline experiments that are no longer required.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
A pipeline experiment is available to archive.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment that you want to archive.
Click the action menu (⋮) beside the pipeline experiment that you want to archive, and then click Archive.
In the Archiving experiment dialog, enter the pipeline experiment name in the text field to confirm that you intend to archive it.
Click Archive.

Verification

The archived pipeline experiment does not appear on the Experiments tab, and instead appears on the Archive tab on the Experiments page for the pipeline experiment.

2.4. Deleting an archived pipeline experiment
Copiar enlace

You can delete pipeline experiments from the OpenShift AI experiment archive.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.
A pipeline experiment is available in the pipeline archive.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the archived pipeline experiment that you want to delete.
Click the Archive tab.
Click the action menu (⋮) beside the pipeline experiment that you want to delete, and then click Delete.
In the Delete experiment? dialog, enter the pipeline experiment name in the text field to confirm that you intend to delete it.
Click Delete.

Verification

The pipeline experiment that you deleted no longer appears on the Archive tab on the Experiments page.

2.5. Restoring an archived pipeline experiment
Copiar enlace

You can restore an archived pipeline experiment to the active state.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
An archived pipeline experiment exists in your project.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the archived pipeline experiment that you want to restore.
Click the Archive tab.
Click the action menu (⋮) beside the pipeline experiment that you want to restore, and then click Restore.
In the Restore experiment dialog, click Restore.

Verification

The restored pipeline experiment appears on the Experiments tab on the Experiments page.

2.6. Viewing pipeline task executions
Copiar enlace

When a pipeline run executes, you can view details of executed tasks in each step in a pipeline run from the OpenShift AI dashboard. A step forms part of a task in a pipeline.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously triggered a pipeline run.

Procedure

From the OpenShift AI dashboard, click Experiments → Executions.
On the Executions page, from the Project drop-down list, select the project that contains the experiment for the pipeline task executions that you want to view.

Verification

On the Executions page, you can view the execution details of each pipeline task execution, such as its name, status, unique ID, and execution type. The execution status indicates whether the pipeline task has successfully executed. For further information about the details of the task execution, click the execution name.

2.7. Viewing pipeline artifacts
Copiar enlace

After a pipeline run executes, you can view its pipeline artifacts from the OpenShift AI dashboard. Pipeline artifacts can help you to evaluate the performance of your pipeline runs and make it easier to understand your pipeline components. Pipeline artifacts can range from plain text data to detailed, interactive data visualizations.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously triggered a pipeline run.

Procedure

From the OpenShift AI dashboard, click Experiments → Artifacts.
On the Artifacts page, from the Project drop-down list, select the project that contains the pipeline experiment for the pipeline artifacts that you want to view.

Verification

On the Artifacts page, you can view the details of each pipeline artifact, such as its name, unique ID, type, and URI.

2.8. Comparing runs
Copiar enlace

You can compare up to 10 pipeline runs at one time, and view available parameter, scalar metric, confusion matrix, and receiver operating characteristic (ROC) curve data for all selected runs.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
You have created at least 2 pipeline runs.

Procedure

In the OpenShift AI dashboard, select Experiments > Experiments and runs.
The Experiments page opens.
From the Project drop-down list, select the project that contains the runs that you want to compare.
On the Experiments tab, in the Experiment column, click the experiment that you want to compare runs for. To select runs that are not in an experiment, click Default. All runs that are created without specifying an experiment will appear in the Default group.
The Runs page opens.
Select the checkbox next to each run that you want to compare, and then click Compare runs. You can compare a maximum of 10 runs at one time.
The Compare runs page opens and displays available parameter, scalar metric, confusion matrix, and receiver operating characteristic (ROC) curve data for the runs that you selected.
1. The Run list section displays a list of selected runs. You can filter the list by run name, experiment, pipeline version start date, duration, and status.
2. The Parameters section displays parameter information for each selected run. Set the Hide parameters with no differences switch to On to hide parameters that have the same values.
3. The Metrics section displays scalar metric, confusion matrix, and ROC curve data for all selected runs.
  1. On the Scalar metrics tab, set the Hide parameters with no differences switch to On to hide parameters that have the same values.
  2. On the ROC curve tab, in the artifacts list, adjust the ROC curve chart by deselecting the checkbox next to artifacts that you want to remove from the chart.
To select different runs for comparison, click Manage runs.
The Manage runs dialog opens.
1. From the Search filter drop-down list, select Run, Experiment, Pipeline version, Created after, or Status to filter the run list by each value.
2. Clear the checkbox next to each run that you want to remove from your comparison.
3. Select the checkbox next to each run that you want to add to your comparison.
Click Update.

Verification

The Compare runs page opens and displays data for the runs that you selected.

Chapter 3. Managing pipeline runs
Copiar enlace

3.1. Overview of pipeline runs
Copiar enlace

A pipeline run is a single execution of a data science pipeline. As data scientist, you can use OpenShift AI to define, manage, and track executions of a data science pipeline. To view a record of previously executed, scheduled, and archived runs, you must first select the experiment from the Experiments → Experiments and Runs page in the OpenShift AI interface. After selecting the experiment, you can access all of its pipeline runs from the Runs page.

You can optimize your use of pipeline runs for portability and repeatability by using pipeline experiments. With experiments, you can logically group pipeline runs and try different configurations of your pipelines. You can also clone your pipeline runs to reproduce and scale them, or archive them when you want to retain a record of their execution, but no longer require them. You can delete archived runs that you no longer want to retain, or you can restore them to their former state.

You can execute a run once, that is, immediately after its creation, or on a recurring basis. Recurring runs consist of a copy of a pipeline with all of its parameter values and a run trigger. A run trigger indicates when a recurring run executes. You can define the following run triggers:

Periodic: used for scheduling runs to execute in intervals.
Cron: used for scheduling runs as a cron job.

You can also configure up to 10 instances of the same run to execute concurrently. You can track the progress of a run from the run details page on the OpenShift AI user interface. From here, you can view the graph and output artifacts for the run.

A pipeline run can be in one of the following states:

Scheduled: A pipeline run that is scheduled to execute at least once
Active: A pipeline run that is executing, or stopped.
Archived: An archived pipeline run.

You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval. If you disable catch up runs, and you have a scheduled run interval ready to execute, the run scheduler only schedules the run execution for the latest run interval. Catch up runs are enabled by default. However, if your pipeline handles backfill internally, Red Hat recommends that you disable catch up runs to avoid duplicate backfill.

After a pipeline run executes, you can view details of its executed tasks on the Executions page, along with its artifacts, on the Artifacts page. From the Executions page, you can view the execution status of each task, which indicates whether it completed successfully. You can also view further information about each executed task by clicking the execution name in the list. From the Artifacts page, you can view the details of each pipeline artifact, such as its name, unique ID, type, and URI. Pipeline artifacts can help you to evaluate the performance of your pipeline runs and make it easier to understand your pipeline components. Pipeline artifacts can range from plain text data to detailed, interactive data visualizations.

You can view further information about each artifact by clicking the artifact name in the list. You can also view or download the content of artifacts that are stored in S3-compatible object storage by clicking the active artifact URI link in the list.

Note

Artifacts that are not stored in S3-compatible object storage are not available to download, and will not appear with an active URI link.

If your browser can display the artifact content, for example, if the artifact is plain text, HTML, or markdown, the content does not download, but is automatically displayed in a new browser tab. If your browser cannot display the artifact content, for example, if the artifact is a model, the artifact automatically downloads instead. To download an artifact which is displayed in a browser tab, right-click on the content and then click Save as.

You can review and analyze logs for each step in an active pipeline run. With the log viewer, you can search for specific log messages, view the log for each step, and download the step logs to your local machine.

3.2. Storing data with data science pipelines
Copiar enlace

When you run a data science pipeline, OpenShift AI stores the pipeline YAML configuration file and resulting pipeline run artifacts in the root directory of your storage bucket. The directories that contain pipeline run artifacts can differ depending on where you executed the pipeline run from. See the following table for further information:

Expand

Table 3.1. Pipeline configuration file and artifacts storage locations
Pipeline run source	Pipeline storage directory	Run artifacts storage directory
OpenShift AI dashboard	`/pipelines/<pipeline_version_id>` Example: `/pipelines/1d01c4eb-d2ab-4916-9935-a73a5580f1fb`	`/<pipeline_name>/<pipeline run_id>` Example: `iris-training-pipeline/2g48k8pw-a8ib-4884-9145-h41j7599h3ds`
JupyterLab Elyra extension	`/pipelines/<pipeline_version_id>`	`/<pipeline_name_timestamp>` Example: `/hello-generic-world-0523161704` With the JupyterLab Elyra extension, you can also set an object storage path prefix. Example: `/iris-project/hello-generic-world-0523161704`

If you want to use an existing artifact that was not generated by a task in a pipeline, you can use the kfp.dsl.importer component to import the artifact from its URI. You can only import these artifacts to the S3-compatible object storage bucket that you define in the Bucket field in your pipeline server configuration. For more information about the kfp.dsl.importer component, see Special Case: Importer Components.

3.3. Viewing active pipeline runs
Copiar enlace

You can view a list of pipeline runs that were previously executed in a pipeline experiment. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to, along with the run status, duration, and execution start time.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously executed a pipeline run that is available.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the active pipeline runs that you want to view.
From the list of experiments, click the experiment that contains the active pipeline runs that you want to view.
The Runs page opens.
After a run has completed its execution, the run status appears in the Status column of the Runs tab, indicating whether the run succeeded or failed.

Verification

A list of active runs appears on the Runs tab on the Runs page for the pipeline experiment.

3.4. Executing a pipeline run
Copiar enlace

By default, a pipeline run executes once immediately after it is created.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment that you want to create a run for.
From the list of pipeline experiments, click the experiment that you want to create a run for.
Click Create run.
On the Create run page, configure the run:
1. From the Experiment list, select the pipeline experiment that you want to create a run for. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
2. In the Name field, enter a name for the run, up to 255 characters.
3. In the Description field, enter a description for the run, up to 255 characters.
4. From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
5. From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
6. Configure the input parameters for the run by selecting the parameters from the list.
7. Click Create run.
  The details page for the run opens.

Verification

The pipeline run that you created appears on the Runs tab on the Runs page for the pipeline experiment.

3.5. Stopping an active pipeline run
Copiar enlace

If you no longer require an active pipeline run to continue executing in a pipeline experiment, you can stop the run before its defined end date.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
There is a previously created data science project available that contains a pipeline server.
You have imported a pipeline to an active pipeline server.
An active pipeline run is currently executing.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the active run that you want to stop.
From the list of pipeline experiments, click the pipeline experiment that contains the run that you want to stop.
On the Runs tab, click the action menu (⋮) beside the active run that you want to stop, and then click Stop.
There might be a short delay while the run stops.

Verification

The Failed status icon ( ) appears in the Status column of the stopped run.

3.6. Duplicating an active pipeline run
Copiar enlace

To make it easier to quickly execute pipeline runs with the same configuration in a pipeline experiment, you can duplicate them.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.
An active run is available to duplicate on the Active tab on the Runs page.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the pipeline run that you want to duplicate.
From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to duplicate.
Click the action menu (⋮) beside the relevant active run, and then click Duplicate.
On the Duplicate run page, configure the duplicate run:
1. From the Experiment list, select a pipeline experiment to contain the duplicate run. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
2. In the Name field, enter a name for the duplicate run.
3. In the Description field, enter a description for the duplicate run.
4. From the Pipeline list, select a pipeline to contain the duplicate run. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
5. From the Pipeline version list, select a pipeline version to contain the duplicate run. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
6. In the Parameters section, configure input parameters for the duplicate run by selecting parameters from the list.
7. Click Create run.
  The details page for the run opens.

Verification

The duplicate pipeline run appears on the Runs tab on the Runs page for the pipeline experiment.

3.7. Viewing scheduled pipeline runs
Copiar enlace

You can view a list of pipeline runs that are scheduled for execution in a pipeline experiment. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to. You can also view the run status, execution frequency, and schedule.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have scheduled a pipeline run that is available to view.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled pipeline runs that you want to view.
From the list of pipeline experiments, click the experiment that contains the pipeline runs that you want to view.
On the Runs page, click the Schedules tab.
After a run is scheduled, the Status column indicates whether the run is ready or unavailable for execution. To change its execution availability, set the Status switch to On or Off. Alternatively, you can change its execution availability from the details page for the scheduled run by clicking the Actions drop-down menu, and then selecting Enable or Disable.

Verification

A list of scheduled runs appears on the Schedules tab on the Runs page for the pipeline experiment.

3.8. Scheduling a pipeline run using a cron job
Copiar enlace

You can use a cron job to schedule a pipeline run to execute at a specific time. Cron jobs are useful for creating periodic and recurring tasks, and can also schedule individual tasks for a specific time, such as if you want to schedule a run for a low activity period. To successfully execute runs in OpenShift AI, you must use the supported format. See Cron Expression Format for more information.

The following examples show the correct format:

Expand

Run occurrence	Cron format
Every five minutes	@every 5m
Every 10 minutes	0 /10 * * *
Daily at 16:16 UTC	0 16 16 * * *
Daily every quarter of the hour	0 0,15,30,45 * * * *
On Monday and Tuesday at 15:40 UTC	0 40 15 * * MON,TUE

3.9. Scheduling a pipeline run
Copiar enlace

To repeatedly run a pipeline, you can create a scheduled pipeline run.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the run that you want to schedule.
From the list of pipeline experiments, click the experiment that contains the run that you want to schedule.
Click the Schedules tab.
Click Create schedule.
On the Schedule run page, configure the run that you are scheduling:
1. From the Experiment list, select the pipeline experiment that you want to contain the scheduled run. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
2. In the Name field, enter a name for the run.
3. In the Description field, enter a description for the run.
4. From the Trigger type list, select one of the following options:
  - Select Periodic to specify an execution frequency. In the Run every field, enter a number and select an execution frequency from the list.
  - Select Cron to specify the execution schedule in cron format in the Cron string field. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supported cron format, see Cron Expression Format.
5. In the Maximum concurrent runs field, specify the number of runs that can execute concurrently, from a range of one to ten.
6. For Start date, specify a start date for the run. Select a start date using the calendar, and the start time from the list of times.
7. For End date, specify an end date for the run. Select an end date using the calendar, and the end time from the list of times.
8. For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when they are paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
9. From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
10. From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
11. Configure the input parameters for the run by selecting the parameters from the list.
12. Click Schedule run.

Verification

The pipeline run that you scheduled appears on the Schedules tab on the Runs page for the pipeline experiment.

3.10. Duplicating a scheduled pipeline run
Copiar enlace

To make it easier to schedule runs to execute as part of your pipeline experiment, you can duplicate existing scheduled runs.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.
A scheduled run is available to duplicate on the Schedules tab on the Runs page.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled run that you want to duplicate.
From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to duplicate.
On the Runs page, click the Schedules tab.
Click the action menu (⋮) beside the run that you want to duplicate, and then click Duplicate.
On the Duplicate schedule page, configure the duplicate run:
1. From the Experiment list, select a pipeline experiment to contain the duplicate run. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
2. In the Name field, enter a name for the duplicate run.
3. In the Description field, enter a description for the duplicate run.
4. From the Trigger type list, select one of the following options:
  - Select Periodic to specify an execution frequency. In the Run every field, enter a number, and select an execution frequency from the list.
  - Select Cron to specify the execution schedule in cron format in the Cron string field. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supported cron format, see Cron Expression Format.
5. For Maximum concurrent runs, specify the number of runs that can execute concurrently, from a range of one to ten.
6. For Start date, specify a start date for the duplicate run. Select a start date using the calendar, and the start time from the list of times.
7. For End date, specify an end date for the duplicate run. Select an end date using the calendar, and the end time from the list of times.
8. For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when they are paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
9. From the Pipeline list, select the pipeline that you want to create a duplicate run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
10. From the Pipeline version list, select the pipeline version to create a duplicate run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
11. Configure input parameters for the run by selecting parameters from the list.
12. Click Schedule run.

Verification

The pipeline run that you duplicated appears on the Schedules tab on the Runs page for the pipeline experiment.

3.11. Deleting a scheduled pipeline run
Copiar enlace

To discard pipeline runs that you previously scheduled, but no longer require, you can delete them so that they do not appear on the Schedules page.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously scheduled a run that is available to delete.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled pipeline run that you want to delete.
From the list of pipeline experiments, click the experiment that contains the scheduled pipeline run that you want to delete.
On the Runs page, click the Schedules tab.
Click the action menu (⋮) beside the scheduled pipeline run that you want to delete, and then click Delete.
In the Delete schedule dialog, enter the run name in the text field to confirm that you intend to delete it.
Click Delete.

Verification

The run that you deleted no longer appears on the Schedules tab for the pipeline experiment.

3.12. Viewing the details of a pipeline run
Copiar enlace

To gain a clearer understanding of your pipeline runs, you can view the details of a previously triggered pipeline run, such as its graph, execution details, and run output.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously triggered a pipeline run.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline that you want to view run details for.
Click Expand ( ) beside the pipeline that you want to view run details for.
Click the action menu (⋮) for the pipeline version that you want to view run details for, and then click View runs.
On the Runs page, click the name of the run that you want to view the details of.
The details page for the run opens.

Verification

On the run details page, you can view the run graph, execution details, input parameters, step logs, and run output.

3.13. Viewing archived pipeline runs
Copiar enlace

You can view a list of pipeline runs that you have archived. You can view details for your archived pipeline runs, such as the pipeline version, run status, duration, and execution start date.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
An archived pipeline run exists.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the archived pipeline runs that you want to view.
From the list of pipeline experiments, click the experiment that contains the archived pipeline runs that you want to view.
On the Runs page, click the Archive tab.

Verification

A list of archived runs appears on the Archive tab on the Runs page for the pipeline experiment.

3.14. Archiving a pipeline run
Copiar enlace

You can retain records of your pipeline runs by archiving them. If required, you can restore runs from your archive to reuse, or delete runs that are no longer required.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously executed a pipeline run that is available.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the run that you want to archive.
From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to archive.
The Runs page opens.
On the Runs tab, click the action menu (⋮) beside the pipeline run that you want to archive, and then click Archive.
In the Archiving run dialog, enter the run name in the text field to confirm that you intend to archive it.
Click Archive.

Verification

The archived run does not appear on the Runs tab, and instead appears on the Archive tab on the Runs page for the pipeline experiment.

3.15. Restoring an archived pipeline run
Copiar enlace

You can restore an archived run to the active state.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
An archived run exists in your project.

Procedure

From the OpenShift AI dashboard, click Experiments → Experiments and runs.
On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment that you want to restore.
From the list of pipeline experiments, click the experiment that contains the archived pipeline run that you want to restore.
On the Runs page, click the Archive tab.
Click the action menu (⋮) beside the pipeline run that you want to restore, and then click Restore.
In the Restore run? dialog, click Restore.

Verification

The restored run appears on the Runs tab on the Runs page for the pipeline experiment.

3.16. Deleting an archived pipeline run
Copiar enlace

You can delete pipeline runs from the OpenShift AI run archive.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have previously created a data science project that is available and has a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously archived a pipeline run.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline run that you want to delete.
Click Expand ( ) beside the pipeline that contains the run that you want to delete.
Click the action menu (⋮) beside the pipeline version that contains the run that you want to delete, and then click View runs.
On the Runs page, click the Archive tab.
Click the action menu (⋮) beside the pipeline run that you want to delete, and then click Delete.
In the Delete run? dialog, enter the run name in the text field to confirm that you intend to delete it.
Click Delete.

Verification

The archived run that you deleted no longer appears on the Archive tab on the Runs page.

3.17. Duplicating an archived pipeline run
Copiar enlace

To make it easier to reproduce runs with the same configuration as runs in your archive, you can duplicate them.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a configured pipeline server.
You have imported a pipeline to an active pipeline server.
An archived run is available to duplicate on the Archived tab on the Runs page.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that contains the pipeline run that you want to duplicate.
Click Expand ( ) beside the pipeline that contains the run that you want to duplicate.
Click the action menu (⋮) beside the pipeline version that contains the run that you want to duplicate, and then click View runs.
On the Runs page, click the Archive tab.
Click the action menu (⋮) beside the pipeline run that you want to duplicate, and then click Duplicate.
On the Duplicate run page, configure the duplicate run:
1. From the Experiment list, select a pipeline experiment to contain the duplicate run. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
2. In the Name field, enter a name for the duplicate run.
3. In the Description field, enter a description for the duplicate run.
4. From the Pipeline list, select a pipeline to contain the duplicate run. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
5. From the Pipeline version list, select a pipeline version to contain the duplicate run. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
6. In the Parameters section, configure input parameters for the duplicate run by selecting parameters from the list.
7. Click Create run.
  The details page for the run opens.

Verification

The duplicate pipeline run appears on the Runs tab on the Runs page for the pipeline experiment.

Chapter 4. Working with pipeline logs
Copiar enlace

4.1. About pipeline logs
Copiar enlace

You can review and analyze step logs for each step in a triggered pipeline run.

To help you troubleshoot and audit your pipelines, you can review and analyze these step logs by using the log viewer in the OpenShift AI dashboard. From here, you can search for specific log messages, view the log for each step, and download the step logs to your local machine.

If the step log file exceeds its capacity, a warning appears above the log viewer stating that the log window displays partial content. Expanding the warning displays further information, such as how the log viewer refreshes every three seconds, and that each step log displays the last 500 lines of log messages received. In addition, you can click download all step logs to download all step logs to your local machine.

Each step has a set of container logs. You can view these container logs by selecting a container from the Steps list in the log viewer. The Step-main container log consists of the log output for the step. The step-copy-artifact container log consists of output relating to artifact data sent to s3-compatible storage. If the data transferred between the steps in your pipeline is larger than 3 KB, five container logs are typically available. These logs contain output relating to data transferred between your persistent volume claims (PVCs).

4.2. Viewing pipeline step logs
Copiar enlace

To help you troubleshoot and audit your pipelines, you can review and analyze the log of each pipeline step using the log viewer. From here, you can search for specific log messages and download the logs for each step in your pipeline. If the pipeline is running, you can also pause and resume the log from the log viewer.

Note

Logs are no longer stored in S3-compatible storage for Python scripts which are running in Elyra pipelines. From OpenShift AI version 2.11, you can view these logs in the pipeline step log viewer.

For this change to take effect, you must use the Elyra runtime images provided in the 2024.1 or 2024.2 workbench images.

If you have an older workbench image version, update the Version selection field to 2024.1, as described in Updating a project workbench.

Updating your workbench image version will clear any existing runtime image selections for your pipeline. After you update your workbench version, open your workbench IDE and update the properties of your pipeline to select a runtime image.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously triggered a pipeline run.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that you want to view logs for.
Click Expand ( ) beside the pipeline that you want to view logs for.
Click the action menu (⋮) on the row containing the project version that you want to view pipeline logs for, and then click View runs.
On the Runs page, click the name of the run that you want to view logs for.
On the run details page, on the Graph tab, click the pipeline step that you want to view logs for.
Click the Logs tab.
To view the logs of another pipeline step, from the Steps list, select the step that you want to view logs for.
Analyze the log using the log viewer.
- To search for a specific log message, enter at least part of the message in the search bar.
- To view the full log in a separate browser window, click the action menu (⋮) and select View raw logs. Alternatively, to expand the size of the log viewer, click the action menu (⋮) and select Expand.

Verification

You can view the logs for each step in your pipeline.

4.3. Downloading pipeline step logs
Copiar enlace

Instead of viewing the step logs of a pipeline run using the log viewer on the OpenShift AI dashboard, you can download them for further analysis. You can choose to download the logs belonging to all steps in your pipeline, or you can download the log only for the step log displayed in the log viewer.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have previously created a data science project that is available and contains a pipeline server.
You have imported a pipeline to an active pipeline server.
You have previously triggered a pipeline run.

Procedure

From the OpenShift AI dashboard, click Data Science Pipelines.
On the Pipelines page, from the Project drop-down list, select the project that you want to download logs for.
Click Expand ( ) beside the pipeline that you want to download logs for.
Click the action menu (⋮) on the row containing the project version that you want to download logs for, and then click View runs.
On the Runs page, click the name of the run that you want to download logs for.
On the run details page, on the Graph tab, click the pipeline step that you want to download logs for.
Click the Logs tab.
In the log viewer, click the Download button ( ).
1. Select Download current stop log to download the log for the current pipeline step.
2. Select Download all step logs to download the logs for all steps in your pipeline run.

Verification

The step logs download to your browser’s default directory for downloaded files.

Chapter 5. Working with pipelines in JupyterLab
Copiar enlace

5.1. Overview of pipelines in JupyterLab
Copiar enlace

You can use Elyra to create visual end-to-end pipeline workflows in JupyterLab. Elyra is an extension for JupyterLab that provides you with a Pipeline Editor to create pipeline workflows that can be executed in OpenShift AI.

You can access the Elyra extension within JupyterLab when you create the most recent version of one of the following notebook images:

Standard Data Science
PyTorch
TensorFlow
TrustyAI

When you use the Pipeline Editor to visually design your pipelines, minimal coding is required to create and run pipelines. For more information about Elyra, see Elyra Documentation. For more information about the Pipeline Editor, see Visual Pipeline Editor. After you have created your pipeline, you can run it locally in JupyterLab, or remotely using data science pipelines in OpenShift AI.

The pipeline creation process consists of the following tasks:

Create a data science project that contains a workbench.
Create a pipeline server.
Create a new pipeline in the Pipeline Editor in JupyterLab.
Develop your pipeline by adding Python notebooks or Python scripts and defining their runtime properties.
Define execution dependencies.
Run or export your pipeline.

Before you can run a pipeline in JupyterLab, your pipeline instance must contain a runtime configuration. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.

If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. For more information about runtime configurations, see Runtime Configuration. As a prerequisite, before you create a workbench, ensure that you have created and configured a pipeline server within the same data science project as your workbench.

You can use S3-compatible cloud storage to make data available to your notebooks and scripts while they are executed. Your cloud storage must be accessible from the machine in your deployment that runs JupyterLab and from the cluster that hosts data science pipelines. Before you create and run pipelines in JupyterLab, ensure that you have your s3-compatible storage credentials readily available.

5.2. Accessing the pipeline editor
Copiar enlace

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have created a data science project.
You have created a workbench with the Standard Data Science notebook image.
You have created and configured a pipeline server within the data science project that contains your workbench.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).
You have access to S3-compatible storage.

Procedure

After you open JupyterLab, confirm that the JupyterLab launcher is automatically displayed.
In the Elyra section of the JupyterLab launcher, click the Pipeline Editor tile.
The Pipeline Editor opens.

Verification

You can view the Pipeline Editor in JupyterLab.

5.3. Creating a runtime configuration
Copiar enlace

If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. This enables you to specify connectivity information for your pipeline instance and S3-compatible cloud storage.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have access to S3-compatible cloud storage.
You have created a data science project that contains a workbench.
You have created and configured a pipeline server within the data science project that contains your workbench.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the left sidebar of JupyterLab, click Runtimes ( ).
Click the Create new runtime configuration button ( ).
The Add new Data Science Pipelines runtime configuration page opens.
Complete the relevant fields to define your runtime configuration.
1. In the Display Name field, enter a name for your runtime configuration.
2. Optional: In the Description field, enter a description to define your runtime configuration.
3. Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
4. Define the credentials of your data science pipeline:
  1. In the Data Science Pipelines API Endpoint field, enter the API endpoint of your data science pipeline. Do not specify the pipelines namespace in this field.
  2. In the Public Data Science Pipelines API Endpoint field, enter the public API endpoint of your data science pipeline.
    Important
    You can obtain the data science pipelines API endpoint from the Data Science Pipelines → Runs page in the dashboard. Copy the relevant endpoint and enter it in the Public Data Science Pipelines API Endpoint field.
  3. Optional: In the Data Science Pipelines User Namespace field, enter the relevant user namespace to run pipelines.
  4. From the Authentication Type list, select the authentication type required to authenticate your pipeline.
    Important
    If you created a notebook directly from the Jupyter tile on the dashboard, select EXISTING_BEARER_TOKEN from the Authentication Type list.
  5. In the Data Science Pipelines API Endpoint Username field, enter the user name required for the authentication type.
  6. In the Data Science Pipelines API Endpoint Password Or Token, enter the password or token required for the authentication type.
    Important
    To obtain the data science pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of --token= from the Log in with this token command.
5. Define the connectivity information of your S3-compatible storage:
  1. In the Cloud Object Storage Endpoint field, enter the endpoint of your S3-compatible storage. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
  2. Optional: In the Public Cloud Object Storage Endpoint field, enter the URL of your S3-compatible storage.
  3. In the Cloud Object Storage Bucket Name field, enter the name of the bucket where your pipeline artifacts are stored. If the bucket name does not exist, it is created automatically.
  4. From the Cloud Object Storage Authentication Type list, select the authentication type required to access to your S3-compatible cloud storage. If you use AWS S3 buckets, select KUBERNETES_SECRET from the list.
  5. In the Cloud Object Storage Credentials Secret field, enter the secret that contains the storage user name and password. This secret is defined in the relevant user namespace, if applicable. In addition, it must be stored on the cluster that hosts your pipeline runtime.
  6. In the Cloud Object Storage Username field, enter the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key ID.
  7. In the Cloud Object Storage Password field, enter the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key.
6. Click Save & Close.

Verification

The runtime configuration that you created appears on the Runtimes tab ( ) in the left sidebar of JupyterLab.

5.4. Updating a runtime configuration
Copiar enlace

To ensure that your runtime configuration is accurate and updated, you can change the settings of an existing runtime configuration.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have access to S3-compatible storage.
You have created a data science project that contains a workbench.
You have created and configured a pipeline server within the data science project that contains your workbench.
A previously created runtime configuration is available in the JupyterLab interface.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to update and click the Edit button ( ).
The Data Science Pipelines runtime configuration page opens.
Fill in the relevant fields to update your runtime configuration.
1. In the Display Name field, update name for your runtime configuration, if applicable.
2. Optional: In the Description field, update the description of your runtime configuration, if applicable.
3. Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
4. Define the credentials of your data science pipeline:
  1. In the Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable. Do not specify the pipelines namespace in this field.
  2. In the Public Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable.
  3. Optional: In the Data Science Pipelines User Namespace field, update the relevant user namespace to run pipelines, if applicable.
  4. From the Authentication Type list, select a new authentication type required to authenticate your pipeline, if applicable.
    Important
    If you created a notebook directly from the Jupyter tile on the dashboard, select EXISTING_BEARER_TOKEN from the Authentication Type list.
  5. In the Data Science Pipelines API Endpoint Username field, update the user name required for the authentication type, if applicable.
  6. In the Data Science Pipelines API Endpoint Password Or Token, update the password or token required for the authentication type, if applicable.
    Important
    To obtain the data science pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of --token= from the Log in with this token command.
5. Define the connectivity information of your S3-compatible storage:
  1. In the Cloud Object Storage Endpoint field, update the endpoint of your S3-compatible storage, if applicable. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
  2. Optional: In the Public Cloud Object Storage Endpoint field, update the URL of your S3-compatible storage, if applicable.
  3. In the Cloud Object Storage Bucket Name field, update the name of the bucket where your pipeline artifacts are stored, if applicable. If the bucket name does not exist, it is created automatically.
  4. From the Cloud Object Storage Authentication Type list, update the authentication type required to access to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, you must select USER_CREDENTIALS from the list.
  5. Optional: In the Cloud Object Storage Credentials Secret field, update the secret that contains the storage user name and password, if applicable. This secret is defined in the relevant user namespace. You must save the secret on the cluster that hosts your pipeline runtime.
  6. Optional: In the Cloud Object Storage Username field, update the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key ID.
  7. Optional: In the Cloud Object Storage Password field, update the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key.
6. Click Save & Close.

Verification

The runtime configuration that you updated is shown on the Runtimes tab ( ) in the left sidebar of JupyterLab.

5.5. Deleting a runtime configuration
Copiar enlace

After you have finished using your runtime configuration, you can delete it from the JupyterLab interface. After deleting a runtime configuration, you cannot run pipelines in JupyterLab until you create another runtime configuration.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have created a data science project that contains a workbench.
You have created and configured a pipeline server within the data science project that contains your workbench.
A previously created runtime configuration is visible in the JupyterLab interface.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to delete and click the Delete Item button ( ).
A dialog box appears prompting you to confirm the deletion of your runtime configuration.
Click OK.

Verification

The runtime configuration that you deleted is no longer shown on the Runtimes tab ( ) in the left sidebar of JupyterLab.

5.6. Duplicating a runtime configuration
Copiar enlace

To prevent you from re-creating runtime configurations with similar values in their entirety, you can duplicate an existing runtime configuration in the JupyterLab interface.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have created a data science project that contains a workbench.
You have created and configured a pipeline server within the data science project that contains your workbench.
A previously created runtime configuration is visible in the JupyterLab interface.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to duplicate and click the Duplicate button ( ).

Verification

The runtime configuration that you duplicated is shown on the Runtimes tab ( ) in the left sidebar of JupyterLab.

5.7. Running a pipeline in JupyterLab
Copiar enlace

You can run pipelines that you have created in JupyterLab from the Pipeline Editor user interface. Before you can run a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. Your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
You have access to S3-compatible storage.
You have created a pipeline in JupyterLab.
You have opened your pipeline in the Pipeline Editor in JupyterLab.
Your pipeline instance contains a runtime configuration.
You have created and configured a pipeline server within the data science project that contains your workbench.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the Pipeline Editor user interface, click Run Pipeline ( ).
The Run Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Note
After you run your pipeline, a pipeline experiment containing your pipeline run is automatically created on the Experiments → Experiments and runs page in the OpenShift AI dashboard. The experiment name matches the name that you assigned to the pipeline.
Define the settings for your pipeline run.
1. From the Runtime Configuration list, select the relevant runtime configuration to run your pipeline.
2. Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
Click OK.

Verification

You can view the details of your pipeline run on the Experiments → Experiments and runs page in the OpenShift AI dashboard.
You can view the output artifacts of your pipeline run. The artifacts are stored in your designated object storage bucket.

5.8. Exporting a pipeline in JupyterLab
Copiar enlace

You can export pipelines that you have created in JupyterLab. When you export a pipeline, the pipeline is prepared for later execution, but is not uploaded or executed immediately. During the export process, any package dependencies are uploaded to S3-compatible storage. Also, pipeline code is generated for the target runtime.

Before you can export a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. In addition, your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can export your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.

Prerequisites

You have logged in to Red Hat OpenShift AI.
If you are using OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
You have created a data science project that contains a workbench.
You have created and configured a pipeline server within the data science project that contains your workbench.
You have access to S3-compatible storage.
You have a created a pipeline in JupyterLab.
You have opened your pipeline in the Pipeline Editor in JupyterLab.
Your pipeline instance contains a runtime configuration.
You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, or PyTorch).

Procedure

In the Pipeline Editor user interface, click Export Pipeline ( ).
The Export Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Define the settings to export your pipeline.
1. From the Runtime Configuration list, select the relevant runtime configuration to export your pipeline.
2. From the Export Pipeline as select an appropriate file format
3. In the Export Filename field, enter a file name for the exported pipeline.
4. Select the Replace if file already exists check box to replace an existing file of the same name as the pipeline you are exporting.
5. Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
Click OK.

Verification

You can view the file containing the pipeline that you exported in your designated object storage bucket.

Chapter 6. Troubleshooting DSPA component errors
Copiar enlace

This table displays common errors found in DataSciencePipelinesApplication (DSPA) components, along with their statuses, messages, and proposed solutions. The Ready condition type accumulates errors from various DSPA components, providing a status view of the DSPA deployment.

Expand

Type Status Error message and solution

Type	Status	Error message and solution
`ObjectStorageAvailable` Ready	False False	Error message: `Could not connect to Object Store: tls: failed to verify certificate: x509: certificate signed by unknown authority` Solution: This issue occurs in clusters that use self-signed certificates with OpenShift AI version 2.9 or later. The data science pipelines manager cannot connect to the object storage as it does not trust the object storage SSL certificate. Therefore, the pipeline server cannot be created. Contact your IT operations administrator to add the relevant Certificate Authority bundle. For more information, see Working with certificates.
`ObjectStorageAvailable` Ready	False False	Error message: `Could not connect to Object Store Deployment for component "ds-pipeline-pipelines-definition" is missing - prerequisite component might not yet be available. Deployment for component "ds-pipeline-persistenceagent-pipelines-definition" is missing - prerequisite component might not yet be available. Deployment for component "ds-pipeline-scheduledworkflow-pipelines-definition" is missing - prerequisite component might not yet be available.` Solution: In clusters running OpenShift AI 2.8.x, the data science pipelines manager might fail to connect to the object storage, and the pipeline server might not be created. Ensure that your object store credentials and connection information are accurate, and verify that the object store is accessible from within the data science project’s associated OpenShift namespace. One common issue is that the object storage SSL certificate is not trusted, particularly if self-signed certificates are used. Verify and update your object storage credentials, then retry the operation.
`ObjectStorageAvailable` Ready	False False	Errors message: `Wrong credentials for Object Storage: Could not connect to (minio-my-project.apps.my-cluster.com), Error: The request signature we calculated does not match the signature you provided. Check your key and signing method.` Solution: Provide the correct credentials for your object storage and retry the operation.
`DatabaseAvailable` Ready	False False	Error message: `FailingToDeploy: Dial tcp XXX.XX.XXX.XXX:3306 : i/o timeout` Solution: If the issue persists beyond startup, check for network issues or misconfigurations in the database connection settings.
`DatabaseAvailable` Ready	False False	Error message: `Unable to connect to external database: tls: failed to verify certificate: x509: certificate signed by unknown authority` Solution: This issue can occur when you use any external database, such as Amazon RDS. The data science pipelines manager cannot connect to the database because it does not trust the database SSL certificate, preventing the creation of the pipeline server. Contact your IT operations administrator to add the relevant certificates. For more information, see Working with certificates.
`DatabaseAvailable` Ready	False False	Error message: `Error 1129: Host 'A.B.C.D' is blocked because of many connection errors.` Solution: This issue might occur when using an external database, such as Amazon RDS. Initially, the pipeline server is created successfully. However, after some time, the OpenShift AI dashboard displays an "Error displaying pipelines" message, and the DSPA conditions indicate that the host is blocked due to multiple connection errors. For more information on how resolve this issue for an external Amazon RDS database, see Resolving "Host is blocked because of many connection errors" error in Amazon RDS for MySQL.
`APIServerReady` Ready	False False	Error message: `Route creation failed due to lengthy project name: Route.route.openshift.io is invalid: spec.host exceeds 63 characters.` Solution: Ensure that the project name in OpenShift is less than 40 characters.
`APIServerReady` Ready	False False	Error message: `FailingToDeploy: Component replica failed to create. Message: serviceaccount "ds-pipeline-sample" not found.` Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account.
`PersistenceAgentReady` Ready	False False	Error message: `FailingToDeploy: Component's replica failed to create. Message: serviceaccount "ds-pipeline-persistenceagent-sample" not found.` Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account.
`ScheduledWorkflowReady` Ready	False False	Error message: `FailingToDeploy: Component's replica failed to create. Message: serviceaccount "ds-pipeline-scheduledworkflow-sample" not found.` Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account
`MLMDProxyReady` Ready	False False	Error message: `Deploying: Component [ds-pipeline-scheduledworkflow-sample] is still deploying.` Solution: Wait for DSPA startup to complete. If deployment fails after 25 seconds, check the logs for further information.

ObjectStorageAvailable

Ready

False

Error message: Could not connect to Object Store: tls: failed to verify certificate: x509: certificate signed by unknown authority

Solution: This issue occurs in clusters that use self-signed certificates with OpenShift AI version 2.9 or later. The data science pipelines manager cannot connect to the object storage as it does not trust the object storage SSL certificate. Therefore, the pipeline server cannot be created. Contact your IT operations administrator to add the relevant Certificate Authority bundle.

For more information, see Working with certificates.

ObjectStorageAvailable

Ready

False

Error message: Could not connect to Object Store Deployment for component "ds-pipeline-pipelines-definition" is missing - prerequisite component might not yet be available. Deployment for component "ds-pipeline-persistenceagent-pipelines-definition" is missing - prerequisite component might not yet be available. Deployment for component "ds-pipeline-scheduledworkflow-pipelines-definition" is missing - prerequisite component might not yet be available.

Solution: In clusters running OpenShift AI 2.8.x, the data science pipelines manager might fail to connect to the object storage, and the pipeline server might not be created.

Ensure that your object store credentials and connection information are accurate, and verify that the object store is accessible from within the data science project’s associated OpenShift namespace. One common issue is that the object storage SSL certificate is not trusted, particularly if self-signed certificates are used.

Verify and update your object storage credentials, then retry the operation.

ObjectStorageAvailable

Ready

False

Errors message: Wrong credentials for Object Storage: Could not connect to (minio-my-project.apps.my-cluster.com), Error: The request signature we calculated does not match the signature you provided. Check your key and signing method.

Solution: Provide the correct credentials for your object storage and retry the operation.

DatabaseAvailable

Ready

False

Error message: FailingToDeploy: Dial tcp XXX.XX.XXX.XXX:3306 : i/o timeout

Solution: If the issue persists beyond startup, check for network issues or misconfigurations in the database connection settings.

DatabaseAvailable

Ready

False

Error message: Unable to connect to external database: tls: failed to verify certificate: x509: certificate signed by unknown authority

Solution: This issue can occur when you use any external database, such as Amazon RDS. The data science pipelines manager cannot connect to the database because it does not trust the database SSL certificate, preventing the creation of the pipeline server. Contact your IT operations administrator to add the relevant certificates.

For more information, see Working with certificates.

DatabaseAvailable

Ready

False

Error message: Error 1129: Host 'A.B.C.D' is blocked because of many connection errors.

Solution: This issue might occur when using an external database, such as Amazon RDS. Initially, the pipeline server is created successfully. However, after some time, the OpenShift AI dashboard displays an "Error displaying pipelines" message, and the DSPA conditions indicate that the host is blocked due to multiple connection errors.

For more information on how resolve this issue for an external Amazon RDS database, see Resolving "Host is blocked because of many connection errors" error in Amazon RDS for MySQL.

APIServerReady

Ready

False

Error message: Route creation failed due to lengthy project name: Route.route.openshift.io is invalid: spec.host exceeds 63 characters.

Solution: Ensure that the project name in OpenShift is less than 40 characters.

APIServerReady

Ready

False

Error message: FailingToDeploy: Component replica failed to create. Message: serviceaccount "ds-pipeline-sample" not found.

Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account.

PersistenceAgentReady

Ready

False

Error message: FailingToDeploy: Component's replica failed to create. Message: serviceaccount "ds-pipeline-persistenceagent-sample" not found.

Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account.

ScheduledWorkflowReady

Ready

False

Error message: FailingToDeploy: Component's replica failed to create. Message: serviceaccount "ds-pipeline-scheduledworkflow-sample" not found.

Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account

MLMDProxyReady

Ready

False

Error message: Deploying: Component [ds-pipeline-scheduledworkflow-sample] is still deploying.

Solution: Wait for DSPA startup to complete. If deployment fails after 25 seconds, check the logs for further information.

6.1. Common errors across DSP components
Copiar enlace

The following table lists errors that might occur across multiple DSPA components:

Expand

Deployment condition and condition type Status Error message and solution

Deployment condition and condition type	Status	Error message and solution
Condition: Component Deployment Not Found Condition type: `ComponentDeploymentNotFound`	False	Error message: `Deployment for component <component> is missing - prerequisite component might not yet be available.` Solution: The deployment for the component does not exist. Typically, this is due to missing deployments or issues that occurred during creation.
Condition: Deployment Scaled Down Condition type: `MinimumReplicasAvailable`	False	Error message: `Deployment for component <component> is scaled down.` Solution: The component is unavailable as the deployment replica count is set to zero.
Condition: Component Failing to Progress Condition type: `FailingToDeploy`	False	Error message: `Component <component> has failed to progress. Reason: <progressingCond.Reason>. Message: <progressingCond.Message>` Solution: The deployment has stalled due to `ProgressDeadlineExceeded` or `ReplicaSetCreateError` issues, or similar.
Condition: Replica Creation Failure Condition type: `FailingToDeploy`	False	Error message: `Component's replica <component> has failed to create. Reason: <replicaFailureCond.Reason>. Message: <replicaFailureCond.Message>` Solution: Replica creation has failed, typically due to an error in the replica set or with the service accounts.
Condition: Pod-Level Failures Condition type: `FailingToDeploy`	False	Error message: `Concatenated failure messages for each pod.` Solution: Deployment pods are in a failed state. Check the pod logs for further information.
Condition: Pod in CrashLoopBackOff Condition type: `FailingToDeploy`	False	Error message: `Component <component> is in CrashLoopBackOff. Message from pod: <crashLoopBackOffMessage>` Solution: Pod containers are failing repeatedly, often due to incorrect environment variables or missing service accounts.
Condition: Component Deploying (No Errors) Condition: type: `Deploying`	False	Error message: `Component <component> is deploying.` Solution: The component deployment process is ongoing with no errors detected.
Condition: Component Minimally Available Condition type: `MinimumReplicasAvailable`	True	Error message: `Component <component> is minimally available.` Solution: The component is available, but only with the minimum number of replicas running.

Condition: Component Deployment Not Found

Condition type: ComponentDeploymentNotFound

False

Error message: Deployment for component <component> is missing - prerequisite component might not yet be available.

Solution: The deployment for the component does not exist. Typically, this is due to missing deployments or issues that occurred during creation.

Condition: Deployment Scaled Down

Condition type: MinimumReplicasAvailable

False

Error message: Deployment for component <component> is scaled down.

Solution: The component is unavailable as the deployment replica count is set to zero.

Condition: Component Failing to Progress

Condition type: FailingToDeploy

False

Error message: Component <component> has failed to progress. Reason: <progressingCond.Reason>. Message: <progressingCond.Message>

Solution: The deployment has stalled due to ProgressDeadlineExceeded or ReplicaSetCreateError issues, or similar.

Condition: Replica Creation Failure

Condition type: FailingToDeploy

False

Error message: Component's replica <component> has failed to create. Reason: <replicaFailureCond.Reason>. Message: <replicaFailureCond.Message>

Solution: Replica creation has failed, typically due to an error in the replica set or with the service accounts.

Condition: Pod-Level Failures

Condition type: FailingToDeploy

False

Error message: Concatenated failure messages for each pod.

Solution: Deployment pods are in a failed state. Check the pod logs for further information.

Condition: Pod in CrashLoopBackOff

Condition type: FailingToDeploy

False

Error message: Component <component> is in CrashLoopBackOff. Message from pod: <crashLoopBackOffMessage>

Solution: Pod containers are failing repeatedly, often due to incorrect environment variables or missing service accounts.

Condition: Component Deploying (No Errors)

Condition: type: Deploying

False

Error message: Component <component> is deploying.

Solution: The component deployment process is ongoing with no errors detected.

Condition: Component Minimally Available

Condition type: MinimumReplicasAvailable

True

Error message: Component <component> is minimally available.

Solution: The component is available, but only with the minimum number of replicas running.

Chapter 7. Migrating to data science pipelines 2.0
Copiar enlace

From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. Data science pipelines 2.0 is enabled and deployed by default in OpenShift AI.

Important

Data science pipelines 2.0 contains an installation of Argo Workflows. OpenShift AI does not support direct customer usage of this installation of Argo Workflows.

To install or upgrade to OpenShift AI 2.9 or later with data science pipelines, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by OpenShift AI. If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install or upgrade OpenShift AI. To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data science pipelines will be enabled automatically.

Argo Workflows resources that are created by OpenShift AI have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io group:

 labels:
    app.kubernetes.io/part-of: data-science-pipelines-operator
    app.opendatahub.io/data-science-pipelines-operator: 'true'

7.1. Upgrading to data science pipelines 2.0
Copiar enlace

Starting with OpenShift AI 2.16, data science pipelines 1.0 resources are no longer supported or managed by OpenShift AI. It is no longer possible to deploy, view, or edit the details of pipelines that are based on data science pipelines 1.0 from either the dashboard or the KFP API server.

OpenShift AI does not automatically migrate existing data science pipelines 1.0 instances to 2.0. If you are upgrading to OpenShift AI 2.16, you must manually migrate your existing data science pipelines 1.0 instances and update your workbenches.

To upgrade to OpenShift AI 2.16 with data science pipelines 2.0, follow these steps:

Note

If you are using GitOps to manage your data science pipelines 1.0 pipeline runs, pause any sync operations related to data science pipelines including PipelineRuns or DataSciencePipelinesApplications (DSPAs) management. After migrating to data science pipelines 2.0, your PipelineRuns will be managed independently of data science pipelines, similar to any other Tekton resources.

Back up your pipelines data.
Deploy a new cluster (or use a different existing cluster) with Red Hat OpenShift AI 2.16 to use as an intermediate cluster. You will use this intermediate cluster to upload, test, and verify your new pipelines.
In OpenShift AI 2.16 on the intermediate cluster, do the following tasks:
1. Create a new data science project.
2. Configure a new pipeline server.
  Important
  If you use an external database, you must use a different external database than the one you use for data science pipelines 1.0, as the database is migrated to data science pipelines 2.0 format.
3. Update and recompile your data science pipelines 1.0 pipelines as described in Migrate to Kubeflow Pipelines v2.
  Note
  Data science pipelines 2.0 does not use the kfp-tekton library. In most cases, you can replace usage of kfp-tekton with the kfp library. For data science pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.
  Tip
  You can view historical data science pipelines 1.0 pipeline run information on your primary cluster in the OpenShift Console Developer perspective under Pipelines → Project → PipelineRuns.
4. Import your updated pipelines to the new data science project.
5. Test and verify your new pipelines.
On your primary cluster, do the following tasks:
1. Remove your data science pipelines 1.0 pipeline servers.
2. Optional: Remove your data science pipelines 1.0 resources. For more information, see Removing data science pipelines 1.0 resources.
3. Upgrade to Red Hat OpenShift AI 2.16. For more information, see Upgrading OpenShift AI Self-Managed, or for disconnected environments, Upgrading Red Hat OpenShift AI in a disconnected environment.
In the upgraded instance of Red Hat OpenShift AI 2.16 on your primary cluster, do the following tasks:
1. Recreate the pipeline servers for each data science project where the data science pipelines 1.0 pipeline servers existed.
  Note
  If you are using GitOps to manage your DSPAs, do the following tasks in your DSPAs before performing sync operations:
  Set spec.dspVersion to v2.
  Verify that the apiVersion is using v1 instead of v1alpha1.
2. Import your updated data science pipelines to the applicable pipeline servers.
  Tip
  You can perform a batch upload by creating a script that uses the KFP SDK Client and the .upload_pipeline and .get_pipeline methods.
For any workbenches that communicate with data science pipelines 1.0, do the following tasks in the upgraded instance of Red Hat OpenShift AI:
1. Delete the existing workbench. For more information, see Deleting a workbench from a data science project.
2. If you want to use the notebook image version 2024.2, upgrade to Python 3.11 before creating a new workbench.
3. Create a new workbench that uses the existing persistent storage of the deleted workbench. For more information, see Creating a workbench.
4. Run the pipeline so that the data science pipelines 2.0 pipeline server schedules it.

7.2. Removing data science pipelines 1.0 resources
Copiar enlace

When your migration to data science pipelines 2.0 is complete on the intermediate cluster, you can clean up the data science pipelines 1.0 resources in your cluster.

Important

Before removing data science pipelines 1.0 resources, ensure that migration of your data science pipelines 1.0 pipelines to 2.0 is complete.

Identify the DataSciencePipelinesApplication (DSPA) resource that corresponds to the data science pipelines 1.0 pipeline server:
```
oc get dspa -n <YOUR_DS_PROJECT>
```

Delete the cluster role binding associated with this DSPA:

oc delete clusterrolebinding
ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa

Delete the DSPA:

oc delete dspa dspa -n <YOUR_DS_PROJECT>

If necessary, delete the DataSciencePipelinesApplication finalizer to complete the removal of the resource:
```
oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge
-p "{\"metadata\":{\"finalizers\":null}}"
```
If you are not using OpenShift Pipelines for any purpose other than data science pipelines 1.0, you can remove the OpenShift Pipelines Operator.
Data science pipelines 1.0 used the kfp-tekton Python library. Data science pipelines 2.0 does not use kfp-tekton. You can uninstall kfp-tekton when there are no remaining data science pipelines 1.0 pipeline servers in use on your cluster.

Legal Notice
Copiar enlace

The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Este contenido no está disponible en el idioma seleccionado.

Working with data science pipelines

Work with data science pipelines from Red Hat OpenShift AI Self-Managed

PrefaceCopiar enlaceEnlace copiado en el portapapeles!

Chapter 1. Managing data science pipelinesCopiar enlaceEnlace copiado en el portapapeles!

1.1. Configuring a pipeline serverCopiar enlaceEnlace copiado en el portapapeles!

1.1.1. Configuring a pipeline server with an external Amazon RDS databaseCopiar enlaceEnlace copiado en el portapapeles!

1.2. Defining a pipelineCopiar enlaceEnlace copiado en el portapapeles!

1.3. Importing a data science pipelineCopiar enlaceEnlace copiado en el portapapeles!

1.4. Deleting a data science pipelineCopiar enlaceEnlace copiado en el portapapeles!

1.5. Deleting a pipeline serverCopiar enlaceEnlace copiado en el portapapeles!

1.6. Viewing the details of a pipeline serverCopiar enlaceEnlace copiado en el portapapeles!

1.7. Viewing existing pipelinesCopiar enlaceEnlace copiado en el portapapeles!

1.8. Overview of pipeline versionsCopiar enlaceEnlace copiado en el portapapeles!

1.9. Uploading a pipeline versionCopiar enlaceEnlace copiado en el portapapeles!

1.10. Deleting a pipeline versionCopiar enlaceEnlace copiado en el portapapeles!

1.11. Viewing the details of a pipeline versionCopiar enlaceEnlace copiado en el portapapeles!

1.12. Downloading a data science pipeline versionCopiar enlaceEnlace copiado en el portapapeles!

1.13. Overview of data science pipelines cachingCopiar enlaceEnlace copiado en el portapapeles!

1.13.1. Caching criteriaCopiar enlaceEnlace copiado en el portapapeles!

1.13.2. Viewing cached steps in the OpenShift AI user interfaceCopiar enlaceEnlace copiado en el portapapeles!

1.13.3. Disabling caching for specific tasks or pipelinesCopiar enlaceEnlace copiado en el portapapeles!

1.13.3.1. Disabling caching for individual tasksCopiar enlaceEnlace copiado en el portapapeles!

1.13.3.2. Disabling caching for pipelinesCopiar enlaceEnlace copiado en el portapapeles!

1.13.4. Verification and troubleshootingCopiar enlaceEnlace copiado en el portapapeles!

Chapter 2. Managing pipeline experimentsCopiar enlaceEnlace copiado en el portapapeles!

2.1. Overview of pipeline experimentsCopiar enlaceEnlace copiado en el portapapeles!

2.2. Creating a pipeline experimentCopiar enlaceEnlace copiado en el portapapeles!

2.3. Archiving a pipeline experimentCopiar enlaceEnlace copiado en el portapapeles!

2.4. Deleting an archived pipeline experimentCopiar enlaceEnlace copiado en el portapapeles!

2.5. Restoring an archived pipeline experimentCopiar enlaceEnlace copiado en el portapapeles!

2.6. Viewing pipeline task executionsCopiar enlaceEnlace copiado en el portapapeles!

2.7. Viewing pipeline artifactsCopiar enlaceEnlace copiado en el portapapeles!

2.8. Comparing runsCopiar enlaceEnlace copiado en el portapapeles!

Chapter 3. Managing pipeline runsCopiar enlaceEnlace copiado en el portapapeles!

3.1. Overview of pipeline runsCopiar enlaceEnlace copiado en el portapapeles!

3.2. Storing data with data science pipelinesCopiar enlaceEnlace copiado en el portapapeles!

3.3. Viewing active pipeline runsCopiar enlaceEnlace copiado en el portapapeles!

3.4. Executing a pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.5. Stopping an active pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.6. Duplicating an active pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.7. Viewing scheduled pipeline runsCopiar enlaceEnlace copiado en el portapapeles!

3.8. Scheduling a pipeline run using a cron jobCopiar enlaceEnlace copiado en el portapapeles!

3.9. Scheduling a pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.10. Duplicating a scheduled pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.11. Deleting a scheduled pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.12. Viewing the details of a pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.13. Viewing archived pipeline runsCopiar enlaceEnlace copiado en el portapapeles!

3.14. Archiving a pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.15. Restoring an archived pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.16. Deleting an archived pipeline runCopiar enlaceEnlace copiado en el portapapeles!

3.17. Duplicating an archived pipeline runCopiar enlaceEnlace copiado en el portapapeles!

Chapter 4. Working with pipeline logsCopiar enlaceEnlace copiado en el portapapeles!

4.1. About pipeline logsCopiar enlaceEnlace copiado en el portapapeles!

4.2. Viewing pipeline step logsCopiar enlaceEnlace copiado en el portapapeles!

4.3. Downloading pipeline step logsCopiar enlaceEnlace copiado en el portapapeles!

Chapter 5. Working with pipelines in JupyterLabCopiar enlaceEnlace copiado en el portapapeles!

5.1. Overview of pipelines in JupyterLabCopiar enlaceEnlace copiado en el portapapeles!

5.2. Accessing the pipeline editorCopiar enlaceEnlace copiado en el portapapeles!

5.3. Creating a runtime configurationCopiar enlaceEnlace copiado en el portapapeles!

5.4. Updating a runtime configurationCopiar enlaceEnlace copiado en el portapapeles!

5.5. Deleting a runtime configurationCopiar enlaceEnlace copiado en el portapapeles!

5.6. Duplicating a runtime configurationCopiar enlaceEnlace copiado en el portapapeles!

5.7. Running a pipeline in JupyterLabCopiar enlaceEnlace copiado en el portapapeles!

5.8. Exporting a pipeline in JupyterLabCopiar enlaceEnlace copiado en el portapapeles!

Chapter 6. Troubleshooting DSPA component errorsCopiar enlaceEnlace copiado en el portapapeles!

6.1. Common errors across DSP componentsCopiar enlaceEnlace copiado en el portapapeles!

Chapter 7. Migrating to data science pipelines 2.0Copiar enlaceEnlace copiado en el portapapeles!

7.1. Upgrading to data science pipelines 2.0Copiar enlaceEnlace copiado en el portapapeles!

7.2. Removing data science pipelines 1.0 resourcesCopiar enlaceEnlace copiado en el portapapeles!

Legal NoticeCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Preface
Copiar enlace

Chapter 1. Managing data science pipelines
Copiar enlace

1.1. Configuring a pipeline server
Copiar enlace

1.1.1. Configuring a pipeline server with an external Amazon RDS database
Copiar enlace

1.2. Defining a pipeline
Copiar enlace

1.3. Importing a data science pipeline
Copiar enlace

1.4. Deleting a data science pipeline
Copiar enlace

1.5. Deleting a pipeline server
Copiar enlace

1.6. Viewing the details of a pipeline server
Copiar enlace

1.7. Viewing existing pipelines
Copiar enlace

1.8. Overview of pipeline versions
Copiar enlace

1.9. Uploading a pipeline version
Copiar enlace

1.10. Deleting a pipeline version
Copiar enlace

1.11. Viewing the details of a pipeline version
Copiar enlace

1.12. Downloading a data science pipeline version
Copiar enlace

1.13. Overview of data science pipelines caching
Copiar enlace

1.13.1. Caching criteria
Copiar enlace

1.13.2. Viewing cached steps in the OpenShift AI user interface
Copiar enlace

1.13.3. Disabling caching for specific tasks or pipelines
Copiar enlace

1.13.3.1. Disabling caching for individual tasks
Copiar enlace

1.13.3.2. Disabling caching for pipelines
Copiar enlace

1.13.4. Verification and troubleshooting
Copiar enlace

Chapter 2. Managing pipeline experiments
Copiar enlace

2.1. Overview of pipeline experiments
Copiar enlace

2.2. Creating a pipeline experiment
Copiar enlace

2.3. Archiving a pipeline experiment
Copiar enlace

2.4. Deleting an archived pipeline experiment
Copiar enlace

2.5. Restoring an archived pipeline experiment
Copiar enlace

2.6. Viewing pipeline task executions
Copiar enlace

2.7. Viewing pipeline artifacts
Copiar enlace

2.8. Comparing runs
Copiar enlace

Chapter 3. Managing pipeline runs
Copiar enlace

3.1. Overview of pipeline runs
Copiar enlace

3.2. Storing data with data science pipelines
Copiar enlace

3.3. Viewing active pipeline runs
Copiar enlace

3.4. Executing a pipeline run
Copiar enlace

3.5. Stopping an active pipeline run
Copiar enlace

3.6. Duplicating an active pipeline run
Copiar enlace

3.7. Viewing scheduled pipeline runs
Copiar enlace

3.8. Scheduling a pipeline run using a cron job
Copiar enlace

3.9. Scheduling a pipeline run
Copiar enlace

3.10. Duplicating a scheduled pipeline run
Copiar enlace

3.11. Deleting a scheduled pipeline run
Copiar enlace

3.12. Viewing the details of a pipeline run
Copiar enlace

3.13. Viewing archived pipeline runs
Copiar enlace

3.14. Archiving a pipeline run
Copiar enlace

3.15. Restoring an archived pipeline run
Copiar enlace

3.16. Deleting an archived pipeline run
Copiar enlace

3.17. Duplicating an archived pipeline run
Copiar enlace

Chapter 4. Working with pipeline logs
Copiar enlace

4.1. About pipeline logs
Copiar enlace

4.2. Viewing pipeline step logs
Copiar enlace

4.3. Downloading pipeline step logs
Copiar enlace

Chapter 5. Working with pipelines in JupyterLab
Copiar enlace

5.1. Overview of pipelines in JupyterLab
Copiar enlace

5.2. Accessing the pipeline editor
Copiar enlace

5.3. Creating a runtime configuration
Copiar enlace

5.4. Updating a runtime configuration
Copiar enlace

5.5. Deleting a runtime configuration
Copiar enlace

5.6. Duplicating a runtime configuration
Copiar enlace

5.7. Running a pipeline in JupyterLab
Copiar enlace

5.8. Exporting a pipeline in JupyterLab
Copiar enlace

Chapter 6. Troubleshooting DSPA component errors
Copiar enlace

6.1. Common errors across DSP components
Copiar enlace

Chapter 7. Migrating to data science pipelines 2.0
Copiar enlace

7.1. Upgrading to data science pipelines 2.0
Copiar enlace

7.2. Removing data science pipelines 1.0 resources
Copiar enlace

Legal Notice
Copiar enlace