Chapter 2. Managing data science pipelines


2.1. Configuring a pipeline server

Before you can successfully create a pipeline in OpenShift AI, you must configure a pipeline server. This task includes configuring where your pipeline artifacts and data are stored.

Note

You are not required to specify any storage directories when configuring a data connection for your pipeline server. When you import a pipeline, the /pipelines folder is created in the root folder of the bucket, containing a YAML file for the pipeline. If you upload a new version of the same pipeline, a new YAML file with a different ID is added to the /pipelines folder.

When you run a pipeline, the artifacts are stored in the /pipeline-name folder in the root folder of the bucket.

Important

If you use an external MySQL database and upgrade to OpenShift AI 2.9 or later, the database is migrated to data science pipelines 2.0 format, making it incompatible with earlier versions of OpenShift AI.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have created a data science project that you can add a pipeline server to.
  • You have an existing S3-compatible object storage bucket and you have configured write access to your S3 bucket on your storage account.
  • If you are configuring a pipeline server with an external MySQL database, your database must use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
  • If you are configuring a pipeline server with a MariaDB database, your database must use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to configure a pipeline server for.

    A project details page opens.

  3. Click the Pipelines tab.
  4. Click Configure pipeline server.

    The Configure pipeline server dialog appears.

  5. In the Object storage connection section, provide values for the mandatory fields:

    1. In the Access key field, enter the access key ID for the S3-compatible object storage provider.
    2. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
    3. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
    4. In the Region field, enter the default region of your S3-compatible object storage account.
    5. In the Bucket field, enter the name of your S3-compatible object storage bucket.

      Important

      If you specify incorrect data connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.

  6. In the Database section, click Show advanced database options to specify the database to store your pipeline data and select one of the following sets of actions:

    • Select Use default database stored on your cluster to deploy a MariaDB database in your project.
    • Select Connect to external MySQL database to add a new connection to an external database that your pipeline server can access.

      1. In the Host field, enter the database’s host name.
      2. In the Port field, enter the database’s port.
      3. In the Username field, enter the default user name that is connected to the database.
      4. In the Password field, enter the password for the default user account.
      5. In the Database field, enter the database name.
  7. Click Configure pipeline server.

Verification

In the Pipelines tab for the project:

  • The Import pipeline button is available.
  • When you click the action menu () and then click View pipeline server configuration, the pipeline server details are displayed.

2.2. Defining a pipeline

The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and data pipelines. Use the latest Kubeflow Pipelines 2.0 SDK to build your data science pipeline in Python code. After you have built your pipeline, use the SDK to compile it into an Intermediate Representation (IR) YAML file. After defining the pipeline, you can import the YAML file to the OpenShift AI dashboard to enable you to configure its execution settings.

You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information about creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information about the Elyra JupyterLab extension, see Elyra Documentation.

2.3. Importing a data science pipeline

To help you begin working with data science pipelines in OpenShift AI, you can import a YAML file containing your pipeline’s code to an active pipeline server, or you can import the YAML file from a URL. This file contains a Kubeflow pipeline compiled by using the Kubeflow compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have previously created a data science project that is available and contains a configured pipeline server.
  • You have compiled your pipeline with the Kubeflow compiler and you have access to the resulting YAML file.
  • If you are uploading your pipeline from a URL, the URL is publicly accessible.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the project that you want to import a pipeline to.
  3. Click Import pipeline.
  4. In the Import pipeline dialog, enter the details for the pipeline that you are importing.

    1. In the Pipeline name field, enter a name for the pipeline that you are importing.
    2. In the Pipeline description field, enter a description for the pipeline that you are importing.
    3. Select where you want to import your pipeline from by performing one of the following actions:

      • Select Upload a file to upload your pipeline from your local machine’s file system. Import your pipeline by clicking upload or by dragging and dropping a file.
      • Select Import by url to upload your pipeline from a URL and then enter the URL into the text box.
    4. Click Import pipeline.

Verification

  • The pipeline that you imported appears on the Pipelines page and on the Pipelines tab on the project details page.

2.4. Deleting a data science pipeline

If you no longer require access to your data science pipeline on the dashboard, you can delete it so that it does not appear on the Data Science Pipelines page.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • There are active pipelines available on the Pipelines page.
  • The pipeline that you want to delete does not contain any pipeline versions.
  • The pipeline that you want to delete does not contain any pipeline versions. For more information, see Deleting a pipeline version.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the project that contains the pipeline that you want to delete from the Project list.
  3. Click the action menu () beside the pipeline that you want to delete and click Delete pipeline.
  4. In the Delete pipeline dialog, enter the pipeline name in the text field to confirm that you intend to delete it.
  5. Click Delete pipeline.

Verification

  • The data science pipeline that you deleted no longer appears on the Pipelines page.

2.5. Deleting a pipeline server

After you have finished running your data science pipelines, you can delete the pipeline server. Deleting a pipeline server automatically deletes all of its associated pipelines, pipeline versions, and runs. If your pipeline data is stored in a database, the database is also deleted along with its meta-data. In addition, after deleting a pipeline server, you cannot create new pipelines or pipeline runs until you create another pipeline server.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have previously created a data science project that is available and contains a pipeline server.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the project for the pipeline server that you want to delete.
  3. From the Pipeline server actions list, select Delete pipeline server.
  4. In the Delete pipeline server dialog, enter the pipeline server’s name in the text field to confirm that you intend to delete it.
  5. Click Delete.

Verification

  • Pipelines previously assigned to the deleted pipeline server no longer appears on the Pipelines page for the relevant data science project.
  • Pipeline runs previously assigned to the deleted pipeline server no longer appears on the Runs page for the relevant data science project.

2.6. Viewing the details of a pipeline server

You can view the details of pipeline servers configured in OpenShift AI, such as the pipeline’s data connection details and where its data is stored.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • You have previously created a data science project that contains an active and available pipeline server.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page opens, select the project whose pipeline server you want to view.
  3. From the Pipeline server actions list, select View pipeline server configuration.

Verification

  • You can view the relevant pipeline server details in the View pipeline server dialog.

2.7. Viewing existing pipelines

You can view the details of pipelines that you have imported to Red Hat OpenShift AI, such as the pipeline’s last run, when it was created, the pipeline’s executed runs, and details of any associated pipeline versions.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have previously created a data science project that is available and contains a pipeline server.
  • You have imported a pipeline to an active pipeline server.
  • Existing pipelines are available.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the relevant project for the pipelines you want to view.
  3. Study the pipelines on the list.
  4. Optional: Click Expand ( rhoai expand icon ) on the relevant row to view details of any pipeline versions associated with the pipeline.

Verification

  • A list of previously created data science pipelines appears on the Pipelines page.

2.8. Overview of pipeline versions

You can manage incremental changes to pipelines in OpenShift AI by using versioning. This allows you to develop and deploy pipelines iteratively, preserving a record of your changes. You can track and manage your changes on the OpenShift AI dashboard, allowing you to schedule and execute runs against all available versions of your pipeline.

2.9. Uploading a pipeline version

You can upload a YAML file to an active pipeline server that contains the latest version of your pipeline, or you can upload the YAML file from a URL. The YAML file must consist of a Kubeflow pipeline compiled by using the Kubeflow compiler. After you upload a pipeline version to a pipeline server, you can execute it by creating a pipeline run.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have previously created a data science project that is available and contains a configured pipeline server.
  • You have a pipeline version available and ready to upload.
  • If you are uploading your pipeline version from a URL, the URL is publicly accessible.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the project that you want to upload a pipeline version to.
  3. Click the Import pipeline dropdown list and select Upload new version.
  4. In the Upload new version dialog, enter the details for the pipeline version that you are uploading.

    1. From the Pipeline list, select the pipeline that you want to upload your pipeline version to.
    2. In the Pipeline version name field, confirm the name for the pipeline version, and change it if necessary.
    3. In the Pipeline version description field, enter a description for the pipeline version.
    4. Select where you want to upload your pipeline version from by performing one of the following actions:

      • Select Upload a file to upload your pipeline version from your local machine’s file system. Import your pipeline version by clicking upload or by dragging and dropping a file.
      • Select Import by url to upload your pipeline version from a URL and then enter the URL into the text box.
    5. Click Upload.

Verification

  • The pipeline version that you uploaded is displayed on the Pipelines page. Click Expand ( rhoai expand icon ) on the row containing the pipeline to view its versions.
  • The Version column on the row containing the pipeline version that you uploaded on the Pipelines page increments by one.

2.10. Deleting a pipeline version

You can delete specific versions of a pipeline when you no longer require them. Deleting a default pipeline version automatically changes the default pipeline version to the next most recent version. If no pipeline versions exist, the pipeline persists without a default version.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have previously created a data science project that is available and contains a pipeline server.
  • You have imported a pipeline to an active pipeline server.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.

    The Pipelines page opens.

  2. Delete the pipeline versions that you no longer require:

    • To delete a single pipeline version:

      1. From the Project list, select the project that contains a version of a pipeline that you want to delete.
      2. On the row containing the pipeline, click Expand ( rhoai expand icon ).
      3. Click the action menu () beside the project version that you want to delete and click Delete pipeline version.

        The Delete pipeline version dialog opens.

      4. Enter the name of the pipeline version in the text field to confirm that you intend to delete it.
      5. Click Delete.
    • To delete multiple pipeline versions:

      1. On the row containing each pipeline version that you want to delete, select the checkbox.
      2. Click the action menu (⋮) next to the Import pipeline dropdown, and select Delete from the list.

Verification

  • The pipeline version that you deleted no longer appears on the Pipelines page or on the Pipelines tab for the data science project.

2.11. Viewing the details of a pipeline version

You can view the details of a pipeline version that you have uploaded to Red Hat OpenShift AI, such as its graph and YAML code.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have previously created a data science project that is available and contains a pipeline server.
  • You have a pipeline available on an active and available pipeline server.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.

    The Pipelines page opens.

  2. From the Project list, select the project containing the pipeline versions that you want to view details for.
  3. Click Expand ( rhoai expand icon ) on the row containing the pipeline that you want to view versions for.
  4. Click the pipeline version that you want to view the details of.

    The Pipeline details page opens, displaying the Graph and YAML tabs.

Verification

  • On the Pipeline details page, you can view the pipeline graph and YAML code.

2.12. Downloading a data science pipeline version

To make further changes to a data science pipeline version that you previously uploaded to OpenShift AI, you can download pipeline version code from the user interface.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have previously created a data science project that is available and contains a configured pipeline server.
  • You have created and imported a pipeline to an active pipeline server that is available to download.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Pipelines Pipelines.
  2. On the Pipelines page, select the project that contains the version that you want to download.
  3. For a pipeline that contains the version that you want to download, click Expand ( rhoai expand icon ).
  4. Click the pipeline version that you want to download.
  5. On the Pipeline details page, click the YAML tab.
  6. Click the Download button ( rhoai download icon ) to download the YAML file containing your pipeline version code to your local machine.

Verification

  • The pipeline version code downloads to your browser’s default directory for downloaded files.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.