Chapter 7. Migrating to data science pipelines 2.0
From OpenShift AI version 2.9, data science pipelines are based on KubeFlow Pipelines (KFP) version 2.0. Data science pipelines 2.0 is enabled and deployed by default in OpenShift AI.
Data science pipelines 2.0 contains an installation of Argo Workflows. Red Hat does not support direct customer usage of this installation of Argo Workflows.
To install or upgrade to OpenShift AI 2.9 or later with data science pipelines, ensure that your cluster does not have an existing installation of Argo Workflows that is not installed by OpenShift AI. If there is an existing installation of Argo Workflows that is not installed by data science pipelines on your cluster, data science pipelines will be disabled after you install or upgrade OpenShift AI. To enable data science pipelines, remove the separate installation of Argo Workflows from your cluster. Data science pipelines will be enabled automatically.
Argo Workflows resources that are created by OpenShift AI have the following labels in the OpenShift Console under Administration > CustomResourceDefinitions, in the argoproj.io
group:
labels: app.kubernetes.io/part-of: data-science-pipelines-operator app.opendatahub.io/data-science-pipelines-operator: 'true'
labels:
app.kubernetes.io/part-of: data-science-pipelines-operator
app.opendatahub.io/data-science-pipelines-operator: 'true'
7.1. Upgrading to data science pipelines 2.0
Starting with OpenShift AI 2.16, data science pipelines 1.0 resources are no longer supported or managed by OpenShift AI. It is no longer possible to deploy, view, or edit the details of pipelines that are based on data science pipelines 1.0 from either the dashboard or the KFP API server.
OpenShift AI does not automatically migrate existing data science pipelines 1.0 instances to 2.0. If you are upgrading to OpenShift AI 2.16 or later, you must manually migrate your existing data science pipelines 1.0 instances and update your workbenches.
To upgrade to OpenShift AI 2.16 or later with data science pipelines 2.0, follow these steps:
If you are using GitOps to manage your data science pipelines 1.0 pipeline runs, pause any sync operations related to data science pipelines including PipelineRuns or DataSciencePipelinesApplications (DSPAs) management. After migrating to data science pipelines 2.0, your PipelineRuns will be managed independently of data science pipelines, similar to any other Tekton resources.
- Back up your pipelines data.
- Deploy a new cluster (or use a different existing cluster) with Red Hat OpenShift AI 2.21 to use as an intermediate cluster. You will use this intermediate cluster to upload, test, and verify your new pipelines.
In OpenShift AI 2.21 on the intermediate cluster, do the following tasks:
- Create a new data science project.
Configure a new pipeline server.
ImportantIf you use an external database, you must use a different external database than the one you use for data science pipelines 1.0, as the database is migrated to data science pipelines 2.0 format.
Update and recompile your data science pipelines 1.0 pipelines as described in Migrate to Kubeflow Pipelines v2.
NoteData science pipelines 2.0 does not use the
kfp-tekton
library. In most cases, you can replace usage ofkfp-tekton
with thekfp
library. For data science pipelines 2.0, use the latest version of the KFP SDK. For more information, see the Kubeflow Pipelines SDK API Reference.TipYou can view historical data science pipelines 1.0 pipeline run information on your primary cluster in the OpenShift Console Developer perspective under Pipelines
Project PipelineRuns. - Import your updated pipelines to the new data science project.
- Test and verify your new pipelines.
On your primary cluster, do the following tasks:
- Remove your data science pipelines 1.0 pipeline servers.
- Optional: Remove your data science pipelines 1.0 resources. For more information, see Removing data science pipelines 1.0 resources.
- Upgrade to Red Hat OpenShift AI 2.21. For more information, see Upgrading OpenShift AI Self-Managed, or for disconnected environments, Upgrading Red Hat OpenShift AI in a disconnected environment.
In the upgraded instance of Red Hat OpenShift AI 2.21 on your primary cluster, do the following tasks:
Recreate the pipeline servers for each data science project where the data science pipelines 1.0 pipeline servers existed.
NoteIf you are using GitOps to manage your DSPAs, do the following tasks in your DSPAs before performing sync operations:
-
Set
spec.dspVersion
tov2
. -
Verify that the
apiVersion
is usingv1
instead ofv1alpha1
.
-
Set
Import your updated data science pipelines to the applicable pipeline servers.
TipYou can perform a batch upload by creating a script that uses the KFP SDK Client and the
.upload_pipeline
and.get_pipeline
methods.
For any workbenches that communicate with data science pipelines 1.0, do the following tasks in the upgraded instance of Red Hat OpenShift AI:
- Delete the existing workbench. For more information, see Deleting a workbench from a data science project.
- For workbenches at version 2024.2 or later, upgrade to Python 3.11 before creating a new workbench. For information about the workbench images provided with OpenShift AI and their preinstalled packages, see the Red Hat OpenShift AI: Supported Configurations Knowledgebase article.
- Create a new workbench that uses the existing persistent storage of the deleted workbench. For more information, see Creating a workbench.
- Run the pipeline so that the data science pipelines 2.0 pipeline server schedules it.
7.2. Removing data science pipelines 1.0 resources
When your migration to data science pipelines 2.0 is complete on the intermediate cluster, you can clean up the data science pipelines 1.0 resources in your cluster.
Before removing data science pipelines 1.0 resources, ensure that migration of your data science pipelines 1.0 pipelines to 2.0 is complete.
Identify the
DataSciencePipelinesApplication
(DSPA) resource that corresponds to the data science pipelines 1.0 pipeline server:oc get dspa -n <YOUR_DS_PROJECT>
oc get dspa -n <YOUR_DS_PROJECT>
Copy to Clipboard Copied! Delete the cluster role binding associated with this DSPA:
oc delete clusterrolebinding ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa
oc delete clusterrolebinding ds-pipeline-ui-auth-delegator-<YOUR_DS_PROJECT>-dspa
Copy to Clipboard Copied! Delete the DSPA:
oc delete dspa dspa -n <YOUR_DS_PROJECT>
oc delete dspa dspa -n <YOUR_DS_PROJECT>
Copy to Clipboard Copied! If necessary, delete the
DataSciencePipelinesApplication
finalizer to complete the removal of the resource:oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge -p "{\"metadata\":{\"finalizers\":null}}"
oc patch dspa dspa -n <YOUR_DS_PROJECT> --type=merge -p "{\"metadata\":{\"finalizers\":null}}"
Copy to Clipboard Copied! - If you are not using OpenShift Pipelines for any purpose other than data science pipelines 1.0, you can remove the OpenShift Pipelines Operator.
-
Data science pipelines 1.0 used the
kfp-tekton
Python library. Data science pipelines 2.0 does not usekfp-tekton
. You can uninstallkfp-tekton
when there are no remaining data science pipelines 1.0 pipeline servers in use on your cluster.