Chapter 3. Working with data science pipelines
As a data scientist, you can enhance your data science projects on OpenShift AI by building portable machine learning (ML) workflows with data science pipelines, using Docker containers. This enables you to standardize and automate machine learning workflows to enable you to develop and deploy your data science models.
For example, the steps in a machine learning workflow might include items such as data extraction, data processing, feature extraction, model training, model validation, and model serving. Automating these activities enables your organization to develop a continuous process of retraining and updating a model based on newly received data. This can help address challenges related to building an integrated machine learning deployment and continuously operating it in production.
You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information, see Working with pipelines in JupyterLab.
From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. For more information, see Enabling Data Science Pipelines 2.0.
A data science pipeline in OpenShift AI consists of the following components:
- Pipeline server: A server that is attached to your data science project and hosts your data science pipeline.
Pipeline: A pipeline defines the configuration of your machine learning workflow and the relationship between each component in the workflow.
- Pipeline code: A definition of your pipeline in a YAML file.
- Pipeline graph: A graphical illustration of the steps executed in a pipeline run and the relationship between them.
Pipeline run: An execution of your pipeline.
- Active run: A pipeline run that is in its execution phase, or is stopped.
- Scheduled run: A pipeline run scheduled to execute at least once.
- Archived run: A pipeline run that resides in the run archive and is no longer required.
This feature is based on Kubeflow Pipelines 2.0. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. The OpenShift AI user interface enables you to track and manage pipelines and pipeline runs. You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes.
You can store your pipeline artifacts in an S3-compatible object storage bucket so that you do not consume local storage. To do this, you must first configure write access to your S3 bucket on your storage account.
3.1. Enabling Data Science Pipelines 2.0
From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. DSP 2.0 is enabled and deployed by default in OpenShift AI.
The PipelineConf
class is deprecated, and there is no KFP 2.0 equivalent.
Data Science Pipelines (DSP) 2.0 contains an installation of Argo Workflows. OpenShift AI does not support direct customer usage of this installation of Argo Workflows.
To install or upgrade to OpenShift AI 2.9 with DSP, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by OpenShift AI.
Argo Workflows resources that are created by OpenShift AI have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io
group:
labels: app.kubernetes.io/part-of: data-science-pipelines-operator app.opendatahub.io/data-science-pipelines-operator: 'true'
3.1.1. Installing OpenShift AI with DSP 2.0
To install OpenShift AI 2.9, ensure that there is no installation of Argo Workflows that is not installed by DSP on your cluster, and follow the installation steps described in Installing and uninstalling OpenShift AI Self-Managed, or for disconnected environments, see Installing and uninstalling Red Hat OpenShift AI in a disconnected environment.
If there is an existing installation of Argo Workflows that is not installed by DSP on your cluster, DSP will be disabled after you install OpenShift AI 2.9 or later.
To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data Science Pipelines will be enabled automatically.
3.1.2. Upgrading to DSP 2.0
After you upgrade to OpenShift AI 2.9 or later, pipelines created with DSP 1.0 continue to run, but are inaccessible from the OpenShift AI dashboard. If you are a current DSP user, do not upgrade to OpenShift AI with DSP 2.0 until you are ready to migrate to the new pipelines solution.
To upgrade to DSP 2.0, follow these steps:
Ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by OpenShift AI, and then follow the upgrade steps described in Upgrading OpenShift AI Self-Managed, or for disconnected environments, Upgrading Red Hat OpenShift AI in a disconnected environment.
If you upgrade to OpenShift AI 2.9 or later with DSP enabled, and there is an existing installation of Argo Workflows that is not installed by DSP on your cluster, OpenShift AI components will not be upgraded. To complete the component upgrade, disable DSP or remove the separate installation of Argo Workflows from your cluster. The component upgrade will then complete automatically.
- Update your workbenches to use the notebook image version 2024.1 or later. For more information, see Updating a project workbench.
- Manually migrate your pipelines from DSP 1.0 to 2.0. For more information, see Migrating pipelines from DSP 1.0 to 2.0.
3.1.3. Migrating pipelines from DSP 1.0 to 2.0
OpenShift AI does not automatically migrate existing DSP 1.0 instances to 2.0. To use existing pipelines with DSP 2.0, you must manually migrate them.
- On OpenShift AI 2.9, create a new data science project.
- Configure a new pipeline server.
Update and recompile your DSP 1.0 pipelines as described in Migrate from KFP SDK v1: v1 to v2 migration instructions and breaking changes.
DSP 2.0 does not use the `kfp-tekton` library. In most cases, you can replace usage of `kfp-tekton` with the `kfp` library.
- Import your updated pipelines to your new DSP 2.0-based data science project.
- (Optional) Remove your DSP 1.0 pipeline server.
Data Science Pipelines 1.0 used the kfp-tekton
Python library. Data Science Pipelines 2.0 does not use kfp-tekton
. You can uninstall kfp-tekton
when there are no remaining DSP 1.0 pipeline servers in use on your cluster.
For Data Science Pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.
3.1.4. Accessing DSP 1.0 pipelines and history
You can view historical DSP 1.0 pipeline run information in the OpenShift Container Platform Console under Pipelines > Project > PipelineRuns.
You can still connect to the KFP API server by using the kfp-tekton
SDK for programmatic access to your pipelines and pipeline run history. For more information, see Kubeflow Pipelines SDK for Tekton.
3.1.5. Uninstalling the OpenShift Pipelines Operator
When your migration to DSP 2.0 is complete, and if you are not using OpenShift Pipelines for any purpose other than Data Science Pipelines 1.0, you can remove the OpenShift Pipelines Operator.
Before removing the OpenShift Pipelines Operator, ensure that migration of your DSP 1.0 pipelines to 2.0 is complete, and that there are no remaining DSP 1.0 pipeline servers in use on your cluster.
3.2. Managing data science pipelines
3.2.1. Configuring a pipeline server
Before you can successfully create a pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored.
You are not required to specify any storage directories when configuring a data connection for your pipeline server. When you import a pipeline, the /pipelines
folder is created in the root
folder of the bucket, containing a YAML file for the pipeline. If you upload a new version of the same pipeline, a new YAML file with a different ID is added to the /pipelines
folder.
When you run a pipeline, the artifacts are stored in the /pipeline-name
folder in the root
folder of the bucket.
If you use an external MySQL database and upgrade to OpenShift AI 2.9, the database is migrated to DSP 2.0 format, making it incompatible with earlier versions of OpenShift AI.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that you can add a pipeline server to.
- You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account.
- If you are configuring a pipeline server with an external MySQL database, your database must use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
- If you are configuring a pipeline server with a MariaDB database, your database must use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.
Procedure
From the OpenShift AI dashboard, click Data Science Projects.
The Data Science Projects page opens.
Click the name of the project that you want to configure a pipeline server for.
A project details page opens.
- Click the Pipelines tab.
Click Configure pipeline server.
The Configure pipeline server dialog appears.
In the Object storage connection section, provide values for the mandatory fields:
- In the Access key field, enter the access key ID for the S3-compatible object storage provider.
- In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
- In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
- In the Region field, enter the default region of your S3-compatible object storage account.
In the Bucket field, enter the name of your S3-compatible object storage bucket.
ImportantIf you specify incorrect data connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.
In the Database section, click Show advanced database options to specify the database to store your pipeline data and select one of the following sets of actions:
- Select Use default database stored on your cluster to deploy a MariaDB database in your project.
Select Connect to external MySQL database to add a new connection to an external database that your pipeline server can access.
- In the Host field, enter the database’s host name.
- In the Port field, enter the database’s port.
- In the Username field, enter the default user name that is connected to the database.
- In the Password field, enter the password for the default user account.
- In the Database field, enter the database name.
- Click Configure pipeline server.
Verification
In the Pipelines tab for the project:
- The Import pipeline button is available.
- When you click the action menu (⋮) and then click View pipeline server configuration, the pipeline server details are displayed.
3.2.2. Defining a pipeline
The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and data pipelines. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings.
You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information about the Elyra JupyterLab extension, see Elyra Documentation.
Additional resources
3.2.3. Importing a data science pipeline
To help you begin working with data science pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have compiled your pipeline with the Kubeflow compiler and you have access to the resulting YAML file.
- If you are uploading your pipeline from a URL, the URL is publicly accessible.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to import a pipeline to.
- Click Import pipeline.
In the Import pipeline dialog, enter the details for the pipeline that you are importing.
- In the Pipeline name field, enter a name for the pipeline that you are importing.
- In the Pipeline description field, enter a description for the pipeline that you are importing.
Select where you want to import your pipeline from by performing one of the following actions:
- Select Upload a file to upload your pipeline from your local machine’s file system. Import your pipeline by clicking upload or by dragging and dropping a file.
- Select Import by url to upload your pipeline from a URL and then enter the URL into the text box.
- Click Import pipeline.
Verification
- The pipeline that you imported appears on the Pipelines page and on the Pipelines tab on the project details page.
3.2.4. Downloading a data science pipeline
To make further changes to a data science pipeline that you previously uploaded to OpenShift AI, you can download the pipeline’s code from the user interface.
Prerequisites
- You have installed the OpenShift Pipelines operator.
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have created and imported a pipeline to an active pipeline server that is available to download.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. The Pipelines page opens.
- From the Project list, select the project whose pipeline that you want to download.
In the Pipeline name column, click the name of the pipeline that you want to download.
The Pipeline details page opens displaying the Graph tab.
Click the YAML tab.
The page reloads to display an embedded YAML editor showing the pipeline code.
- Click the Download button ( ) to download the YAML file containing your pipeline’s code to your local machine.
Verification
- The pipeline code is downloaded to your browser’s default directory for downloaded files.
3.2.5. Deleting a data science pipeline
If you no longer require access to your data science pipeline on the dashboard, you can delete it so that it does not appear on the Data Science Pipelines page.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - There are active pipelines available on the Pipelines page.
- The pipeline that you want to delete does not contain any pipeline versions.
- The pipeline that you want to delete does not contain any pipeline versions. For more information, see Deleting a pipeline version.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that contains the pipeline that you want to delete from the Project list.
- Click the action menu (⋮) beside the pipeline that you want to delete and click Delete pipeline.
- In the Delete pipeline dialog, enter the pipeline name in the text field to confirm that you intend to delete it.
- Click Delete pipeline.
Verification
- The data science pipeline that you deleted no longer appears on the Pipelines page.
3.2.6. Deleting a pipeline server
After you have finished running your data science pipelines, you can delete the pipeline server. Deleting a pipeline server automatically deletes all of its associated pipelines, pipeline versions, and runs. If your pipeline data is stored in a database, the database is also deleted along with its meta-data. In addition, after deleting a pipeline server, you cannot create new pipelines or pipeline runs until you create another pipeline server.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project for the pipeline server that you want to delete.
- From the Pipeline server actions list, select Delete pipeline server.
- In the Delete pipeline server dialog, enter the pipeline server’s name in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- Pipelines previously assigned to the deleted pipeline server no longer appears on the Pipelines page for the relevant data science project.
- Pipeline runs previously assigned to the deleted pipeline server no longer appears on the Runs page for the relevant data science project.
3.2.7. Viewing the details of a pipeline server
You can view the details of pipeline servers configured in OpenShift AI, such as the pipeline’s data connection details and where its data is stored.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have previously created a data science project that contains an active and available pipeline server.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page opens, select the project whose pipeline server you want to view.
- From the Pipeline server actions list, select View pipeline server configuration.
Verification
- You can view the relevant pipeline server details in the View pipeline server dialog.
3.2.8. Viewing existing pipelines
You can view the details of pipelines that you have imported to Red Hat OpenShift AI, such as the pipeline’s last run, when it was created, the pipeline’s executed runs, and details of any associated pipeline versions.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- Existing pipelines are available.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the relevant project for the pipelines you want to view.
- Study the pipelines on the list.
- Optional: Click Expand ( ) on the relevant row to view details of any pipeline versions associated with the pipeline.
Verification
- A list of previously created data science pipelines appears on the Pipelines page.
3.2.9. Overview of pipeline versions
You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes. You can track and manage your changes on the OpenShift AI dashboard, allowing you to schedule and execute runs against all available versions of your pipeline.
3.2.10. Uploading a pipeline version
You can upload a YAML file to an active pipeline server that contains the latest version of your pipeline, or you can upload the YAML file from a URL. The YAML file must consist of a Kubeflow pipeline compiled by using the Kubeflow compiler. After you upload a pipeline version to a pipeline server, you can execute it by creating a pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have a pipeline version available and ready to upload.
- If you are uploading your pipeline version from a URL, the URL is publicly accessible.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to upload a pipeline version to.
- Click the Import pipeline dropdown list and select Upload new version.
In the Upload new version dialog, enter the details for the pipeline version that you are uploading.
- From the Pipeline list, select the pipeline that you want to upload your pipeline version to.
- In the Pipeline version name field, confirm the name for the pipeline version, and change it if necessary.
- In the Pipeline version description field, enter a description for the pipeline version.
Select where you want to upload your pipeline version from by performing one of the following actions:
- Select Upload a file to upload your pipeline version from your local machine’s file system. Import your pipeline version by clicking upload or by dragging and dropping a file.
- Select Import by url to upload your pipeline version from a URL and then enter the URL into the text box.
- Click Upload.
Verification
- The pipeline version that you uploaded is displayed on the Pipelines page. Click Expand ( ) on the row containing the pipeline to view its versions.
- The Version column on the row containing the pipeline version that you uploaded on the Pipelines page increments by one.
3.2.11. Deleting a pipeline version
You can delete specific versions of a pipeline when you no longer require them. Deleting a default pipeline version automatically changes the default pipeline version to the next most recent version. If no pipeline versions exist, the pipeline persists without a default version.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. The Pipelines page opens.
Delete the pipeline versions that you no longer require:
To delete a single pipeline version:
- From the Project list, select the project that contains a version of a pipeline that you want to delete.
- On the row containing the pipeline, click Expand ( ).
Click the action menu (⋮) beside the project version that you want to delete and click Delete pipeline version.
The Delete pipeline version dialog opens.
- Enter the name of the pipeline version in the text field to confirm that you intend to delete it.
- Click Delete.
To delete multiple pipeline versions:
- On the row containing each pipeline version that you want to delete, select the checkbox.
- Click the action menu (⋮) next to the Import pipeline dropdown, and select Delete from the list.
Verification
- The pipeline version that you deleted no longer appears on the Pipelines page or on the Pipelines tab for the data science project.
3.2.12. Viewing pipeline versions
You can view all versions for a pipeline on the Pipelines page.
Prerequisites
- You have installed the OpenShift Pipelines operator.
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have a pipeline available on an active and available pipeline server.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. The Pipelines page opens.
- From the Project list, select the project containing the pipeline versions that you want to view.
- Click Expand ( ) on the row containing the pipeline that you want to view versions for.
Verification
- You can view the versions of the pipeline on the Pipelines page.
3.2.13. Viewing the details of a pipeline version
You can view the details of a pipeline version that you have uploaded to Red Hat OpenShift AI, such as its graph and YAML code.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have a pipeline available on an active and available pipeline server.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. The Pipelines page opens.
- From the Project list, select the project containing the pipeline versions that you want to view details for.
- Click Expand ( ) on the row containing the pipeline that you want to view versions for.
Click the pipeline version that you want to view the details of.
The Pipeline details page opens, displaying the Graph and YAML tabs.
Verification
- On the Pipeline details page, you can view the pipeline graph and YAML code.
3.3. Managing pipeline runs
3.3.1. Overview of pipeline runs
A pipeline run is a single execution of a data science pipeline. As data scientist, you can use OpenShift AI to define, manage, and track executions of a data science pipeline. You can view a record of previously executed, scheduled, and archived runs from the Runs page in the OpenShift AI user interface.
You can optimize your use of pipeline runs for portability. You can clone your pipeline runs to reproduce and scale them accordingly, or archive them when you want to retain a record of their execution, but no longer require them. You can delete archived runs that you no longer want to retain, or you can restore them to their former state.
You can execute a run once, that is, immediately after its creation, or on a recurring basis. Recurring runs consist of a copy of a pipeline with all of its parameter values and a run trigger. A run trigger indicates when a recurring run executes. You can define the following run triggers:
- Periodic: used for scheduling runs to execute in intervals.
- Cron: used for scheduling runs as a cron job.
You can also configure multiple instances of the same run to execute concurrently, from a range of one to ten. When executed, you can track the run’s progress from the run Details page on the OpenShift AI user interface. From here, you can view the run’s graph, and output artifacts. A pipeline run can be in one of the following states:
- Scheduled run: A pipeline run scheduled to execute at least once.
- Active run: A pipeline run that is in its execution phase, or is stopped.
- Archived run: A pipeline run that resides in the run archive and is no longer required.
You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval. If you disable catch up runs, and you have a scheduled run interval ready to execute, the run scheduler only schedules the run execution for the latest run interval. Catch up runs are enabled by default. However, if your pipeline handles backfill internally, Red Hat recommends that you disable catch up runs to avoid duplicate backfill.
You can review and analyze logs for each step in an active pipeline run. With the log viewer, you can search for specific log messages, view the log for each step, and download the step logs to your local machine.
3.3.2. Storing data with data science pipelines
When you run a data science pipeline, OpenShift AI stores the pipeline YAML configuration file and resulting pipeline run artifacts in the root
directory of your storage bucket. The directories that contain pipeline run artifacts can differ depending on where you executed the pipeline run from. See the following table for further information:
Pipeline run source | Pipeline storage directory | Run artifacts storage directory |
---|---|---|
OpenShift AI dashboard |
Example: |
Example: |
JupyterLab Elyra extension |
|
Example: With the JupyterLab Elyra extension, you can also set an object storage path prefix.
Example: |
3.3.3. Viewing active pipeline runs
You can view a list of pipeline runs that were previously executed in OpenShift AI. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to, along with the run status, duration, and execution start time.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously executed a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the active pipeline runs that you want to view.
On the Run details page, click the Active tab.
After a run has completed its execution, the run’s status appears in the Status column in the table, indicating whether the run has succeeded or failed.
Verification
- A list of active runs appears in the Active tab on the Runs page.
3.3.4. Executing a pipeline run
You can instantiate a single execution of a pipeline by creating an active pipeline run that executes immediately after creation.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that you want to create a run for.
- Click Create run.
On the Create run page, configure the run:
- In the Name field, enter a name for the run.
- In the Description field, enter a description for the run.
- From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline and complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version and complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Create run.
Verification
- The pipeline run that you created appears in the Active tab on the Runs page.
3.3.5. Stopping an active pipeline run
If you no longer require an active pipeline run to continue executing, you can stop the run before its defined end date.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - There is a previously created data science project available that contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An active pipeline run is currently executing.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that contains the pipeline whose active run you want to stop.
In the Active tab, click the action menu (⋮) beside the active run that you want to delete and click Stop.
There might be a short delay while the run stops.
Verification
- In the list of active runs, the status of the run is "stopped".
3.3.6. Duplicating an active pipeline run
To make it easier to quickly execute pipeline runs with the same configuration, you can duplicate them.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- An active run is available to duplicate in the Active tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that has the pipeline run that you want to duplicate.
- Click the action menu (⋮) beside the relevant active run and click Duplicate.
In the Duplicate run page, configure the duplicate run:
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
- From the Pipeline list, select the pipeline to contain the duplicate run.
- From the Pipeline version list, select the pipeline version to contain the duplicate run.
- In the Parameters section, configure the input parameters for the run that you are duplicating by selecting the appropriate parameters from the list.
- Click Create run.
Verification
- The duplicate pipeline run appears in the Active tab on the Runs page.
3.3.7. Viewing scheduled pipeline runs
You can view a list of pipeline runs that are scheduled for execution in OpenShift AI. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to. You can also view the run status, execution frequency, and schedule.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have scheduled a pipeline run that is available to view.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. The Runs page opens.
- From the Project list, select the project whose scheduled pipeline runs you want to view.
- Click the Schedules tab.
Study the table showing a list of scheduled runs.
After a run has been scheduled, the run’s status indicates whether the run is ready for execution or unavailable for execution. To change its execution availability, click the run’s Status icon.
Verification
- A list of scheduled runs appears in the Schedules tab on the Runs page.
3.3.8. Scheduling a pipeline run using a cron job
You can use a cron job to schedule a pipeline run to execute at a specific time. Cron jobs are useful for creating periodic and recurring tasks, and can also schedule individual tasks for a specific time, such as if you want to schedule a run for a low activity period. To successfully execute runs in OpenShift AI, you must use the supported format. See Cron Expression Format for more information.
The following examples show the correct format:
Run occurrence | Cron format |
---|---|
Every five minutes | @every 5m |
Every 10 minutes | 0 */10 * * * * |
Daily at 16:16 UTC | 0 16 16 * * * |
Daily every quarter of the hour | 0 0,15,30,45 * * * * |
On Monday and Tuesday at 15:40 UTC | 0 40 15 * * MON,TUE |
Additional resources
3.3.9. Scheduling a pipeline run
To repeatedly run a pipeline, you can create a scheduled pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. The Runs page opens.
- From the Project list, select the project that you want to schedule a run for.
- Click the Schedules tab.
- Click Schedule run.
On the Schedule run page, configure the run that you are scheduling:
- In the Name field, enter a name for the run.
- In the Description field, enter a description for the run.
From the Trigger type list, select one of the following options:
- Select Periodic to specify an execution frequency. In the Run every field, enter a numerical value and select an execution frequency from the list.
-
Select Cron to specify the execution schedule in
cron
format. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supportedcron
format, see Cron Expression Format.
- In the Maximum concurrent runs field, specify the number of runs that can execute concurrently, from a range of one to ten.
- For Start date, specify a start date for the run. Select a start date using the Calendar and the start time from the list of times.
- For End date, specify an end date for the run. Select an end date using the Calendar and the end time from the list of times.
- For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
- From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline and complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version and complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Schedule run.
Verification
- The pipeline run that you created appears in the Schedules tab on the Runs page.
3.3.10. Duplicating a scheduled pipeline run
To make it easier to schedule runs to execute as part of your pipeline configuration, you can duplicate existing scheduled runs.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- A scheduled run is available to duplicate in the Schedules tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that has the pipeline run that you want to duplicate.
- Click the Schedules tab.
- Click the action menu (⋮) beside the run that you want to duplicate and click Duplicate.
On the Duplicate schedule page, configure the duplicate run:
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
From the Trigger type list, select one of the following options:
- Select Periodic to specify an execution frequency. In the Run every field, enter a numerical value and select an execution frequency from the list.
-
Select Cron to specify the execution schedule in
cron
format. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supportedcron
format, see Cron Expression Format.
- For Maximum concurrent runs, specify the number of runs that can execute concurrently, from a range of one to ten.
- For Start date, specify a start date for the duplicate run. Select a start date using the Calendar and the start time from the list of times.
- For End date, specify an end date for the duplicate run. Select an end date using the Calendar and the end time from the list of times.
- For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
- From the Pipeline list, select the pipeline that you want to create a duplicate run for. Alternatively, to create a new pipeline, click Create new pipeline and complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a duplicate run for. Alternatively, to upload a new version, click Upload new version and complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Schedule run.
Verification
- The pipeline run that you duplicated appears in the Schedules tab on the Runs page.
3.3.11. Deleting a scheduled pipeline run
To discard pipeline runs that you previously scheduled, but no longer require, you can delete them so that they do not appear on the Runs page.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously scheduled a run that is available to delete.
Procedure
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. The Runs page opens.
From the Project list, select the project that contains the pipeline whose scheduled run you want to delete.
The page refreshes to show the pipeline’s scheduled runs on the Schedules tab.
Click the action menu (⋮) beside the scheduled run that you want to delete and click Delete.
The Delete schedule dialog opens.
- Enter the run’s name in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- The run that you deleted no longer appears on the Schedules tab.
3.3.12. Viewing the details of a pipeline run
To gain a clearer understanding of your pipeline runs, you can view the details of a previously triggered pipeline run, such as its graph, execution details, and run output.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously triggered a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to view run details for.
- For a pipeline that you want to view run details for, click Expand ( ).
- Click the action menu (⋮) for the pipeline version and then click View runs.
- On the Runs page, click the name of the run that you want to view the details of.
Verification
- On the Run details page, you can view the run’s graph, execution details, input parameters, step logs, and run output.
3.3.13. Viewing archived pipeline runs
You can view a list of pipeline runs that you have archived. You can view details for your archived pipeline runs, such as the pipeline version, run status, duration, and execution start date.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived pipeline run exists.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the archived pipeline runs that you want to view.
- Click the Archived tab.
Verification
- A list of archived runs appears in the Archived tab on the Runs page.
3.3.14. Archiving a pipeline run
You can retain records of your pipeline runs by archiving them. If required, you can restore runs from your archive to reuse, or delete runs that are no longer required.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously executed a pipeline run that is available.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the pipeline run that you want to archive from the Project list.
- On the Run details page, click the action menu (⋮) beside the run that you want to archive and then click Archive.
- In the Archiving run dialog, enter the run name in the text field to confirm that you intend to archive it.
- Click Archive.
Verification
- The archived run does not appear in the Active tab and instead appears in the Archived tab on the Runs page.
3.3.15. Restoring an archived pipeline run
You can restore an archived run to the active state.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived run exists in your project.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the archived pipeline run that you want to restore.
- On the Run details page, click the Archived tab.
- Click the action menu (⋮) beside the run that you want to restore and click Restore.
- In the Restore run dialog, enter the run name in the text field to confirm that you intend to restore it.
- Click Restore.
Verification
- The restored run appears in the Active tab on the Runs page.
3.3.16. Deleting an archived pipeline run
You can delete pipeline runs from the OpenShift AI run archive.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously archived a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the archived pipeline run you want to delete.
- In the Run details page, click Archived.
- Click the action menu (⋮) beside the run that you want to delete and click Delete.
- In the Delete run dialog, enter the run name in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- The archived run that you deleted no longer appears in the Archived tab on the Runs page.
3.3.17. Duplicating an archived pipeline run
To make it easier to reproduce runs with the same configuration as runs in your archive, you can duplicate them.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived run is available to duplicate in the Archived tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that has the pipeline run that you want to duplicate.
- Click the Archived tab.
- Click the action menu (⋮) beside the relevant archived run and click Duplicate.
On the Duplicate run page, configure the duplicate run:
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
- From the Pipeline list, select the pipeline to contain the duplicate run.
- From the Pipeline version list, select the pipeline version to contain the duplicate run.
- In the Parameters section, configure the input parameters for the run that you are duplicating by selecting the appropriate parameters from the list.
- Click Create run.
Verification
- The duplicate pipeline run appears in the Active tab on the Runs page.
3.4. Working with pipeline logs
3.4.1. About pipeline logs
You can review and analyze step logs for each step in a triggered pipeline run.
To help you troubleshoot and audit your pipelines, you can review and analyze these step logs by using the log viewer in the OpenShift AI dashboard. From here, you can search for specific log messages, view the log for each step, and download the step logs to your local machine.
If the step log file exceeds its capacity, a warning appears above the log viewer stating that the log window displays partial content. Expanding the warning displays further information, such as how the log viewer refreshes every three seconds, and that each step log displays the last 500 lines of log messages received. In addition, you can click download all step logs to download all step logs to your local machine.
Each step has a set of container logs. You can view these container logs by selecting a container from the Steps list in the log viewer. The Step-main
container log consists of the log output for the step. The step-copy-artifact
container log consists of output relating to artifact data sent to s3-compatible storage. If the data transferred between the steps in your pipeline is larger than 3 KB, five container logs are typically available. These logs contain output relating to data transferred between your persistent volume claims (PVCs).
3.4.2. Viewing pipeline step logs
To help you troubleshoot and audit your pipelines, you can review and analyze the log of each pipeline step using the log viewer. From here, you can search for specific log messages and download the logs for each step in your pipeline. If the pipeline is running, you can also pause and resume the log from the log viewer.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously triggered a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to view logs for.
- For the pipeline that you want to view logs for, click Expand ( ).
- Click the action menu (⋮) on the row containing the project version that you want to view pipeline logs for and click View runs.
- On the Runs page, click the name of the run that you want to view logs for.
- On the graph on the Run details page , click the pipeline step that you want to view logs for.
- Click the Logs tab.
- To view the logs of another pipeline step, from the Steps list, select the step that you want to view logs for.
Analyze the log using the log viewer.
- To search for a specific log message, enter at least part of the message in the search bar.
- To view the full log in a separate browser window, click the action menu (⋮) and select View raw logs. Alternatively, to expand the size of the log viewer, click the action menu (⋮) and select Expand.
Verification
- You can view the logs for each step in your pipeline.
3.4.3. Downloading pipeline step logs
Instead of viewing the step logs of a pipeline run using the log viewer on the OpenShift AI dashboard, you can download them for further analysis. You can choose to download the logs belonging to all steps in your pipeline, or you can download the log only for the step log displayed in the log viewer.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously triggered a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to download logs for.
- For the pipeline that you want to download logs for, click Expand ( ).
- Click View runs on the row containing the pipeline version that you want to download logs for.
- On the Runs page, click the name of the run that you want to download logs for.
- On the graph on the Run details page, click the pipeline step that you want to download logs for.
- Click the Logs tab.
In the log viewer, click the Download button ( ).
- Select Download current stop log to download the log for the current pipeline step.
- Select Download all step logs to download the logs for all steps in your pipeline run.
Verification
- The step logs download to your browser’s default directory for downloaded files.
3.5. Working with pipelines in JupyterLab
3.5.1. Overview of pipelines in JupyterLab
You can use Elyra to create visual end-to-end pipeline workflows in JupyterLab. Elyra is an extension for JupyterLab that provides you with a Pipeline Editor to create pipeline workflows that can be executed in OpenShift AI.
You can access the Elyra extension within JupyterLab when you create the most recent version of one of the following notebook images:
- Standard Data Science
- PyTorch
- TensorFlow
- TrustyAI
- HabanaAI
When you use the Pipeline Editor to visually design your pipelines, minimal coding is required to create and run pipelines. For more information about Elyra, see Elyra Documentation. For more information about the Pipeline Editor, see Visual Pipeline Editor. After you have created your pipeline, you can run it locally in JupyterLab, or remotely using data science pipelines in OpenShift AI.
The pipeline creation process consists of the following tasks:
- Create a data science project that contains a workbench.
- Create a pipeline server.
- Create a new pipeline in the Pipeline Editor in JupyterLab.
- Develop your pipeline by adding Python notebooks or Python scripts and defining their runtime properties.
- Define execution dependencies.
- Run or export your pipeline.
Before you can run a pipeline in JupyterLab, your pipeline instance must contain a runtime configuration. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. For more information about runtime configurations, see Runtime Configuration. As a prerequisite, before you create a workbench, ensure that you have created and configured a pipeline server within the same data science project as your workbench.
You can use S3-compatible cloud storage to make data available to your notebooks and scripts while they are executed. Your cloud storage must be accessible from the machine in your deployment that runs JupyterLab and from the cluster that hosts Data Science Pipelines. Before you create and run pipelines in JupyterLab, ensure that you have your s3-compatible storage credentials readily available.
Additional resources
3.5.2. Accessing the pipeline editor
You can use Elyra to create visual end-to-end pipeline workflows in JupyterLab. Elyra is an extension for JupyterLab that provides you with a Pipeline Editor to create pipeline workflows that can execute in OpenShift AI.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project.
- You have created a workbench with the Standard Data Science notebook image.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
- You have access to S3-compatible storage.
Procedure
- After you open JupyterLab, confirm that the JupyterLab launcher is automatically displayed.
In the Elyra section of the JupyterLab launcher, click the Pipeline Editor tile.
The Pipeline Editor opens.
Verification
- You can view the Pipeline Editor in JupyterLab.
3.5.3. Creating a runtime configuration
If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. This enables you to specify connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible cloud storage.
- You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Click the Create new runtime configuration button ( ).
The Add new Data Science Pipelines runtime configuration page opens.
Complete the relevant fields to define your runtime configuration.
- In the Display Name field, enter a name for your runtime configuration.
- Optional: In the Description field, enter a description to define your runtime configuration.
- Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
Define the credentials of your data science pipeline:
- In the Data Science Pipelines API Endpoint field, enter the API endpoint of your data science pipeline. Do not specify the pipelines namespace in this field.
In the Public Data Science Pipelines API Endpoint field, enter the public API endpoint of your data science pipeline.
ImportantYou can obtain the Data Science Pipelines API endpoint from the Data Science Pipelines
Runs page in the dashboard. Copy the relevant end point and enter it in the Public Data Science Pipelines API Endpoint field. - Optional: In the Data Science Pipelines User Namespace field, enter the relevant user namespace to run pipelines.
From the Authentication Type list, select the authentication type required to authenticate your pipeline.
ImportantIf you created a notebook directly from the Jupyter tile on the dashboard, select
EXISTING_BEARER_TOKEN
from the Authentication Type list.- In the Data Science Pipelines API Endpoint Username field, enter the user name required for the authentication type.
In the Data Science Pipelines API Endpoint Password Or Token, enter the password or token required for the authentication type.
ImportantTo obtain the Data Science Pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of
--token=
from the Log in with this token command.
Define the connectivity information of your S3-compatible storage:
- In the Cloud Object Storage Endpoint field, enter the endpoint of your S3-compatible storage. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
- Optional: In the Public Cloud Object Storage Endpoint field, enter the URL of your S3-compatible storage.
- In the Cloud Object Storage Bucket Name field, enter the name of the bucket where your pipeline artifacts are stored. If the bucket name does not exist, it is created automatically.
-
From the Cloud Object Storage Authentication Type list, select the authentication type required to access to your S3-compatible cloud storage. If you use AWS S3 buckets, select
KUBERNETES_SECRET
from the list. - In the Cloud Object Storage Credentials Secret field, enter the secret that contains the storage user name and password. This secret is defined in the relevant user namespace, if applicable. In addition, it must be stored on the cluster that hosts your pipeline runtime.
- In the Cloud Object Storage Username field, enter the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key ID.
- In the Cloud Object Storage Password field, enter the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key.
- Click Save & Close.
Verification
- The runtime configuration that you created is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
3.5.4. Updating a runtime configuration
To ensure that your runtime configuration is accurate and updated, you can change the settings of an existing runtime configuration.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible storage.
- You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is available in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to update and click the Edit button ( ).
The Data Science Pipelines runtime configuration page opens.
Fill in the relevant fields to update your runtime configuration.
- In the Display Name field, update name for your runtime configuration, if applicable.
- Optional: In the Description field, update the description of your runtime configuration, if applicable.
- Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
Define the credentials of your data science pipeline:
- In the Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable. Do not specify the pipelines namespace in this field.
- In the Public Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable.
- Optional: In the Data Science Pipelines User Namespace field, update the relevant user namespace to run pipelines, if applicable.
From the Authentication Type list, select a new authentication type required to authenticate your pipeline, if applicable.
ImportantIf you created a notebook directly from the Jupyter tile on the dashboard, select
EXISTING_BEARER_TOKEN
from the Authentication Type list.- In the Data Science Pipelines API Endpoint Username field, update the user name required for the authentication type, if applicable.
In the Data Science Pipelines API Endpoint Password Or Token, update the password or token required for the authentication type, if applicable.
ImportantTo obtain the Data Science Pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of
--token=
from the Log in with this token command.
Define the connectivity information of your S3-compatible storage:
- In the Cloud Object Storage Endpoint field, update the endpoint of your S3-compatible storage, if applicable. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
- Optional: In the Public Cloud Object Storage Endpoint field, update the URL of your S3-compatible storage, if applicable.
- In the Cloud Object Storage Bucket Name field, update the name of the bucket where your pipeline artifacts are stored, if applicable. If the bucket name does not exist, it is created automatically.
-
From the Cloud Object Storage Authentication Type list, update the authentication type required to access to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, you must select
USER_CREDENTIALS
from the list. - Optional: In the Cloud Object Storage Credentials Secret field, update the secret that contains the storage user name and password, if applicable. This secret is defined in the relevant user namespace. You must save the secret on the cluster that hosts your pipeline runtime.
- Optional: In the Cloud Object Storage Username field, update the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key ID.
- Optional: In the Cloud Object Storage Password field, update the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key.
- Click Save & Close.
Verification
- The runtime configuration that you updated is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
3.5.5. Deleting a runtime configuration
After you have finished using your runtime configuration, you can delete it from the JupyterLab interface. After deleting a runtime configuration, you cannot run pipelines in JupyterLab until you create another runtime configuration.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is visible in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to delete and click the Delete Item button ( ).
A dialog box appears prompting you to confirm the deletion of your runtime configuration.
- Click OK.
Verification
- The runtime configuration that you deleted is no longer shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
3.5.6. Duplicating a runtime configuration
To prevent you from re-creating runtime configurations with similar values in their entirety, you can duplicate an existing runtime configuration in the JupyterLab interface.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is visible in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
- Hover the cursor over the runtime configuration that you want to duplicate and click the Duplicate button ( ).
Verification
- The runtime configuration that you duplicated is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
3.5.7. Running a pipeline in JupyterLab
You can run pipelines that you have created in JupyterLab from the Pipeline Editor user interface. Before you can run a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. Your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible storage.
- You have created a pipeline in JupyterLab.
- You have opened your pipeline in the Pipeline Editor in JupyterLab.
- Your pipeline instance contains a runtime configuration.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
In the Pipeline Editor user interface, click Run Pipeline ( ).
The Run Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Define the settings for your pipeline run.
- From the Runtime Configuration list, select the relevant runtime configuration to run your pipeline.
- Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
- Click OK.
Verification
- You can view the output artifacts of your pipeline run. The artifacts are stored in your designated object storage bucket.
3.5.8. Exporting a pipeline in JupyterLab
You can export pipelines that you have created in JupyterLab. When you export a pipeline, the pipeline is prepared for later execution, but is not uploaded or executed immediately. During the export process, any package dependencies are uploaded to S3-compatible storage. Also, pipeline code is generated for the target runtime.
Before you can export a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. In addition, your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can export your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have access to S3-compatible storage.
- You have a created a pipeline in JupyterLab.
- You have opened your pipeline in the Pipeline Editor in JupyterLab.
- Your pipeline instance contains a runtime configuration.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
In the Pipeline Editor user interface, click Export Pipeline ( ).
The Export Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Define the settings to export your pipeline.
- From the Runtime Configuration list, select the relevant runtime configuration to export your pipeline.
- From the Export Pipeline as select an appropriate file format
- In the Export Filename field, enter a file name for the exported pipeline.
- Select the Replace if file already exists check box to replace an existing file of the same name as the pipeline you are exporting.
- Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
- Click OK.
Verification
- You can view the file containing the pipeline that you exported in your designated object storage bucket.