Chapter 7. Migrating to data science pipelines 2.0


From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. Data science pipelines 2.0 is enabled and deployed by default in OpenShift AI.

Important

Data science pipelines 2.0 contains an installation of Argo Workflows. Red Hat does not support direct customer usage of this installation of Argo Workflows.

To install or upgrade to OpenShift AI 2.9 or later with data science pipelines, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by OpenShift AI. If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install or upgrade OpenShift AI. To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data science pipelines will be enabled automatically.

Argo Workflows resources that are created by OpenShift AI have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io group:

 labels:
    app.kubernetes.io/part-of: data-science-pipelines-operator
    app.opendatahub.io/data-science-pipelines-operator: 'true'
Copy to Clipboard

7.1. Upgrading to data science pipelines 2.0

Starting with OpenShift AI 2.16, data science pipelines 1.0 resources are no longer supported or managed by OpenShift AI. It is no longer possible to deploy, view, or edit the details of pipelines that are based on data science pipelines 1.0 from either the dashboard or the KFP API server.

OpenShift AI does not automatically migrate existing data science pipelines 1.0 instances to 2.0. If you are upgrading to OpenShift AI 2.16 or later, you must manually migrate your existing data science pipelines 1.0 instances and update your workbenches.

To upgrade to OpenShift AI 2.16 or later with data science pipelines 2.0, follow these steps:

Note

If you are using GitOps to manage your data science pipelines 1.0 pipeline runs, pause any sync operations related to data science pipelines including PipelineRuns or DataSciencePipelinesApplications (DSPAs) management. After migrating to data science pipelines 2.0, your PipelineRuns will be managed independently of data science pipelines, similar to any other Tekton resources.

  1. Back up your pipelines data.
  2. Deploy a new cluster (or use a different existing cluster) with Red Hat OpenShift AI 2.21 to use as an intermediate cluster. You will use this intermediate cluster to upload, test, and verify your new pipelines.
  3. In OpenShift AI 2.21 on the intermediate cluster, do the following tasks:

    1. Create a new data science project.
    2. Configure a new pipeline server.

      Important

      If you use an external database, you must use a different external database than the one you use for data science pipelines 1.0, as the database is migrated to data science pipelines 2.0 format.

    3. Update and recompile your data science pipelines 1.0 pipelines as described in Migrate to Kubeflow Pipelines v2.

      Note

      Data science pipelines 2.0 does not use the kfp-tekton library. In most cases, you can replace usage of kfp-tekton with the kfp library. For data science pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.

      Tip

      You can view historical data science pipelines 1.0 pipeline run information on your primary cluster in the OpenShift Console Developer perspective under Pipelines Project PipelineRuns.

    4. Import your updated pipelines to the new data science project.
    5. Test and verify your new pipelines.
  4. On your primary cluster, do the following tasks:

    1. Remove your data science pipelines 1.0 pipeline servers.
    2. Optional: Remove your data science pipelines 1.0 resources. For more information, see Removing data science pipelines 1.0 resources.
    3. Upgrade to Red Hat OpenShift AI 2.21. For more information, see Upgrading OpenShift AI Self-Managed, or for disconnected environments, Upgrading Red Hat OpenShift AI in a disconnected environment.
  5. In the upgraded instance of Red Hat OpenShift AI 2.21 on your primary cluster, do the following tasks:

    1. Recreate the pipeline servers for each data science project where the data science pipelines 1.0 pipeline servers existed.

      Note

      If you are using GitOps to manage your DSPAs, do the following tasks in your DSPAs before performing sync operations:

      • Set spec.dspVersion to v2.
      • Verify that the apiVersion is using v1 instead of v1alpha1.
    2. Import your updated data science pipelines to the applicable pipeline servers.

      Tip

      You can perform a batch upload by creating a script that uses the KFP SDK Client and the .upload_pipeline and .get_pipeline methods.

  6. For any workbenches that communicate with data science pipelines 1.0, do the following tasks in the upgraded instance of Red Hat OpenShift AI:

    1. Delete the existing workbench. For more information, see Deleting a workbench from a data science project.
    2. For workbenches at version 2024.2 or later, upgrade to Python 3.11 before creating a new workbench. For information about the workbench images provided with OpenShift AI and their preinstalled packages, see the Red Hat OpenShift AI: Supported Configurations Knowledgebase article.
    3. Create a new workbench that uses the existing persistent storage of the deleted workbench. For more information, see Creating a workbench.
    4. Run the pipeline so that the data science pipelines 2.0 pipeline server schedules it.

7.2. Removing data science pipelines 1.0 resources

When your migration to data science pipelines 2.0 is complete on the intermediate cluster, you can clean up the data science pipelines 1.0 resources in your cluster.

Important

Before removing data science pipelines 1.0 resources, ensure that migration of your data science pipelines 1.0 pipelines to 2.0 is complete.

  1. Identify the DataSciencePipelinesApplication (DSPA) resource that corresponds to the data science pipelines 1.0 pipeline server:

    oc get dspa -n <YOUR_DS_PROJECT>
    Copy to Clipboard
  2. Delete the cluster role binding associated with this DSPA:

    oc delete clusterrolebinding
    ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa
    Copy to Clipboard
  3. Delete the DSPA:

    oc delete dspa dspa -n <YOUR_DS_PROJECT>
    Copy to Clipboard
  4. If necessary, delete the DataSciencePipelinesApplication finalizer to complete the removal of the resource:

    oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge
    -p "{\"metadata\":{\"finalizers\":null}}"
    Copy to Clipboard
  5. If you are not using OpenShift Pipelines for any purpose other than data science pipelines 1.0, you can remove the OpenShift Pipelines Operator.
  6. Data science pipelines 1.0 used the kfp-tekton Python library. Data science pipelines 2.0 does not use kfp-tekton. You can uninstall kfp-tekton when there are no remaining data science pipelines 1.0 pipeline servers in use on your cluster.
Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat