Chapter 6. Working with pipelines in JupyterLab
6.1. Overview of pipelines in JupyterLab
You can use Elyra to create visual end-to-end pipeline workflows in JupyterLab. Elyra is an extension for JupyterLab that provides you with a Pipeline Editor to create pipeline workflows that can be executed in OpenShift AI.
You can access the Elyra extension within JupyterLab when you create the most recent version of one of the following notebook images:
- Standard Data Science
- PyTorch
- TensorFlow
- TrustyAI
- HabanaAI
When you use the Pipeline Editor to visually design your pipelines, minimal coding is required to create and run pipelines. For more information about Elyra, see Elyra Documentation. For more information about the Pipeline Editor, see Visual Pipeline Editor. After you have created your pipeline, you can run it locally in JupyterLab, or remotely using data science pipelines in OpenShift AI.
The pipeline creation process consists of the following tasks:
- Create a data science project that contains a workbench.
- Create a pipeline server.
- Create a new pipeline in the Pipeline Editor in JupyterLab.
- Develop your pipeline by adding Python notebooks or Python scripts and defining their runtime properties.
- Define execution dependencies.
- Run or export your pipeline.
Before you can run a pipeline in JupyterLab, your pipeline instance must contain a runtime configuration. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. For more information about runtime configurations, see Runtime Configuration. As a prerequisite, before you create a workbench, ensure that you have created and configured a pipeline server within the same data science project as your workbench.
You can use S3-compatible cloud storage to make data available to your notebooks and scripts while they are executed. Your cloud storage must be accessible from the machine in your deployment that runs JupyterLab and from the cluster that hosts data science pipelines. Before you create and run pipelines in JupyterLab, ensure that you have your s3-compatible storage credentials readily available.
Additional resources
6.2. Accessing the pipeline editor
You can use Elyra to create visual end-to-end pipeline workflows in JupyterLab. Elyra is an extension for JupyterLab that provides you with a Pipeline Editor to create pipeline workflows that can execute in OpenShift AI.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project.
- You have created a workbench with the Standard Data Science notebook image.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
- You have access to S3-compatible storage.
Procedure
- After you open JupyterLab, confirm that the JupyterLab launcher is automatically displayed.
In the Elyra section of the JupyterLab launcher, click the Pipeline Editor tile.
The Pipeline Editor opens.
Verification
- You can view the Pipeline Editor in JupyterLab.
6.3. Creating a runtime configuration
If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. This enables you to specify connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible cloud storage.
- You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Click the Create new runtime configuration button ( ).
The Add new Data Science Pipelines runtime configuration page opens.
Complete the relevant fields to define your runtime configuration.
- In the Display Name field, enter a name for your runtime configuration.
- Optional: In the Description field, enter a description to define your runtime configuration.
- Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
Define the credentials of your data science pipeline:
- In the Data Science Pipelines API Endpoint field, enter the API endpoint of your data science pipeline. Do not specify the pipelines namespace in this field.
In the Public Data Science Pipelines API Endpoint field, enter the public API endpoint of your data science pipeline.
ImportantYou can obtain the data science pipelines API endpoint from the Data Science Pipelines
Runs page in the dashboard. Copy the relevant end point and enter it in the Public Data Science Pipelines API Endpoint field. - Optional: In the Data Science Pipelines User Namespace field, enter the relevant user namespace to run pipelines.
From the Authentication Type list, select the authentication type required to authenticate your pipeline.
ImportantIf you created a notebook directly from the Jupyter tile on the dashboard, select
EXISTING_BEARER_TOKEN
from the Authentication Type list.- In the Data Science Pipelines API Endpoint Username field, enter the user name required for the authentication type.
In the Data Science Pipelines API Endpoint Password Or Token, enter the password or token required for the authentication type.
ImportantTo obtain the data science pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of
--token=
from the Log in with this token command.
Define the connectivity information of your S3-compatible storage:
- In the Cloud Object Storage Endpoint field, enter the endpoint of your S3-compatible storage. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
- Optional: In the Public Cloud Object Storage Endpoint field, enter the URL of your S3-compatible storage.
- In the Cloud Object Storage Bucket Name field, enter the name of the bucket where your pipeline artifacts are stored. If the bucket name does not exist, it is created automatically.
-
From the Cloud Object Storage Authentication Type list, select the authentication type required to access to your S3-compatible cloud storage. If you use AWS S3 buckets, select
KUBERNETES_SECRET
from the list. - In the Cloud Object Storage Credentials Secret field, enter the secret that contains the storage user name and password. This secret is defined in the relevant user namespace, if applicable. In addition, it must be stored on the cluster that hosts your pipeline runtime.
- In the Cloud Object Storage Username field, enter the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key ID.
- In the Cloud Object Storage Password field, enter the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, enter your AWS Secret Access Key.
- Click Save & Close.
Verification
- The runtime configuration that you created is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
6.4. Updating a runtime configuration
To ensure that your runtime configuration is accurate and updated, you can change the settings of an existing runtime configuration.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible storage.
- You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is available in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to update and click the Edit button ( ).
The Data Science Pipelines runtime configuration page opens.
Fill in the relevant fields to update your runtime configuration.
- In the Display Name field, update name for your runtime configuration, if applicable.
- Optional: In the Description field, update the description of your runtime configuration, if applicable.
- Optional: In the Tags field, click Add Tag to define a category for your pipeline instance. Enter a name for the tag and press Enter.
Define the credentials of your data science pipeline:
- In the Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable. Do not specify the pipelines namespace in this field.
- In the Public Data Science Pipelines API Endpoint field, update the API endpoint of your data science pipeline, if applicable.
- Optional: In the Data Science Pipelines User Namespace field, update the relevant user namespace to run pipelines, if applicable.
From the Authentication Type list, select a new authentication type required to authenticate your pipeline, if applicable.
ImportantIf you created a notebook directly from the Jupyter tile on the dashboard, select
EXISTING_BEARER_TOKEN
from the Authentication Type list.- In the Data Science Pipelines API Endpoint Username field, update the user name required for the authentication type, if applicable.
In the Data Science Pipelines API Endpoint Password Or Token, update the password or token required for the authentication type, if applicable.
ImportantTo obtain the data science pipelines API endpoint token, in the upper-right corner of the OpenShift web console, click your user name and select Copy login command. After you have logged in, click Display token and copy the value of
--token=
from the Log in with this token command.
Define the connectivity information of your S3-compatible storage:
- In the Cloud Object Storage Endpoint field, update the endpoint of your S3-compatible storage, if applicable. For more information about Amazon s3 endpoints, see Amazon Simple Storage Service endpoints and quotas.
- Optional: In the Public Cloud Object Storage Endpoint field, update the URL of your S3-compatible storage, if applicable.
- In the Cloud Object Storage Bucket Name field, update the name of the bucket where your pipeline artifacts are stored, if applicable. If the bucket name does not exist, it is created automatically.
-
From the Cloud Object Storage Authentication Type list, update the authentication type required to access to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, you must select
USER_CREDENTIALS
from the list. - Optional: In the Cloud Object Storage Credentials Secret field, update the secret that contains the storage user name and password, if applicable. This secret is defined in the relevant user namespace. You must save the secret on the cluster that hosts your pipeline runtime.
- Optional: In the Cloud Object Storage Username field, update the user name to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key ID.
- Optional: In the Cloud Object Storage Password field, update the password to connect to your S3-compatible cloud storage, if applicable. If you use AWS S3 buckets, update your AWS Secret Access Key.
- Click Save & Close.
Verification
- The runtime configuration that you updated is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
6.5. Deleting a runtime configuration
After you have finished using your runtime configuration, you can delete it from the JupyterLab interface. After deleting a runtime configuration, you cannot run pipelines in JupyterLab until you create another runtime configuration.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is visible in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
Hover the cursor over the runtime configuration that you want to delete and click the Delete Item button ( ).
A dialog box appears prompting you to confirm the deletion of your runtime configuration.
- Click OK.
Verification
- The runtime configuration that you deleted is no longer shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
6.6. Duplicating a runtime configuration
To prevent you from re-creating runtime configurations with similar values in their entirety, you can duplicate an existing runtime configuration in the JupyterLab interface.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- A previously created runtime configuration is visible in the JupyterLab interface.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
- In the left sidebar of JupyterLab, click Runtimes ( ).
- Hover the cursor over the runtime configuration that you want to duplicate and click the Duplicate button ( ).
Verification
- The runtime configuration that you duplicated is shown in the Runtimes tab ( ) in the left sidebar of JupyterLab.
6.7. Running a pipeline in JupyterLab
You can run pipelines that you have created in JupyterLab from the Pipeline Editor user interface. Before you can run a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. Your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can run your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have access to S3-compatible storage.
- You have created a pipeline in JupyterLab.
- You have opened your pipeline in the Pipeline Editor in JupyterLab.
- Your pipeline instance contains a runtime configuration.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
In the Pipeline Editor user interface, click Run Pipeline ( ).
The Run Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Define the settings for your pipeline run.
- From the Runtime Configuration list, select the relevant runtime configuration to run your pipeline.
- Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
- Click OK.
Verification
- You can view the output artifacts of your pipeline run. The artifacts are stored in your designated object storage bucket.
6.8. Exporting a pipeline in JupyterLab
You can export pipelines that you have created in JupyterLab. When you export a pipeline, the pipeline is prepared for later execution, but is not uploaded or executed immediately. During the export process, any package dependencies are uploaded to S3-compatible storage. Also, pipeline code is generated for the target runtime.
Before you can export a pipeline, you must create a data science project and a pipeline server. After you create a pipeline server, you must create a workbench within the same project as your pipeline server. In addition, your pipeline instance in JupyterLab must contain a runtime configuration. If you create a workbench as part of a data science project, a default runtime configuration is created automatically. However, if you create a notebook from the Jupyter tile in the OpenShift AI dashboard, you must create a runtime configuration before you can export your pipeline in JupyterLab. A runtime configuration defines connectivity information for your pipeline instance and S3-compatible cloud storage.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have created a data science project that contains a workbench.
- You have created and configured a pipeline server within the data science project that contains your workbench.
- You have access to S3-compatible storage.
- You have a created a pipeline in JupyterLab.
- You have opened your pipeline in the Pipeline Editor in JupyterLab.
- Your pipeline instance contains a runtime configuration.
- You have created and launched a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI).
Procedure
In the Pipeline Editor user interface, click Export Pipeline ( ).
The Export Pipeline dialog appears. The Pipeline Name field is automatically populated with the pipeline file name.
Define the settings to export your pipeline.
- From the Runtime Configuration list, select the relevant runtime configuration to export your pipeline.
- From the Export Pipeline as select an appropriate file format
- In the Export Filename field, enter a file name for the exported pipeline.
- Select the Replace if file already exists check box to replace an existing file of the same name as the pipeline you are exporting.
- Optional: Configure your pipeline parameters, if applicable. If your pipeline contains nodes that reference pipeline parameters, you can change the default parameter values. If a parameter is required and has no default value, you must enter a value.
- Click OK.
Verification
- You can view the file containing the pipeline that you exported in your designated object storage bucket.