Chapter 4. Managing pipeline runs
4.1. Overview of pipeline runs
A pipeline run is a single execution of a data science pipeline. As data scientist, you can use OpenShift AI to define, manage, and track executions of a data science pipeline. You can view a record of previously executed, scheduled, and archived runs from the Runs page in the OpenShift AI user interface.
You can optimize your use of pipeline runs for portability and repeatability by using pipeline experiments. With experiments, you can logically group pipeline runs and try different configurations of your pipelines. You can also clone your pipeline runs to reproduce and scale them, or archive them when you want to retain a record of their execution, but no longer require them. You can delete archived runs that you no longer want to retain, or you can restore them to their former state.
You can execute a run once, that is, immediately after its creation, or on a recurring basis. Recurring runs consist of a copy of a pipeline with all of its parameter values and a run trigger. A run trigger indicates when a recurring run executes. You can define the following run triggers:
- Periodic: used for scheduling runs to execute in intervals.
- Cron: used for scheduling runs as a cron job.
You can also configure multiple instances of the same run to execute concurrently, from a range of one to ten. When executed, you can track the run’s progress from the run Details page on the OpenShift AI user interface. From here, you can view the run’s graph, and output artifacts. A pipeline run can be in one of the following states:
- Scheduled: A pipeline run that is scheduled to execute at least once
- Active: A pipeline run that is executing, or stopped.
- Archived: An archived pipeline run.
You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval. If you disable catch up runs, and you have a scheduled run interval ready to execute, the run scheduler only schedules the run execution for the latest run interval. Catch up runs are enabled by default. However, if your pipeline handles backfill internally, Red Hat recommends that you disable catch up runs to avoid duplicate backfill.
After a pipeline run executes, you can view details of its executed tasks on the Executions page, along with its artifacts, on the Artifacts page. From the Executions page, you can view the execution status of each task, which indicates whether it completed successfully. You can also view further information about each executed task by clicking the execution name in the list. From the Artifacts page, you can view the the details of each pipeline artifact, such as its name, unique ID, type, and URI. Pipeline artifacts can help you to evaluate the performance of your pipeline runs and make it easier to understand your pipeline components. Pipeline artifacts can range from plain text data or detailed, interactive data visualizations.
You can review and analyze logs for each step in an active pipeline run. With the log viewer, you can search for specific log messages, view the log for each step, and download the step logs to your local machine.
4.2. Storing data with data science pipelines
When you run a data science pipeline, OpenShift AI stores the pipeline YAML configuration file and resulting pipeline run artifacts in the root
directory of your storage bucket. The directories that contain pipeline run artifacts can differ depending on where you executed the pipeline run from. See the following table for further information:
Pipeline run source | Pipeline storage directory | Run artifacts storage directory |
---|---|---|
OpenShift AI dashboard |
Example: |
Example: |
JupyterLab Elyra extension |
|
Example: With the JupyterLab Elyra extension, you can also set an object storage path prefix.
Example: |
4.3. Viewing active pipeline runs
You can view a list of pipeline runs that were previously executed in a pipeline experiment. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to, along with the run status, duration, and execution start time.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously executed a pipeline run that is available.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the active pipeline runs that you want to view.
- From the list of experiments, click the experiment that contains the active pipeline runs that you want to view.
On the Runs page, click the Runs tab.
After a run has completed its execution, the run’s status appears in the Status column in the table, indicating whether the run has succeeded or failed.
Verification
- A list of active runs appears in the Runs tab on the Runs page for the pipeline experiment.
4.4. Executing a pipeline run
By default, a pipeline run executes once immediately after it is created.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment that you want to create a run for.
- From the list of pipeline experiments, click the experiment that you want to create a run for.
- Click Create run.
On the Create run page, configure the run:
- From the Experiment list, select the pipeline experiment that you want to create a run for. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
- In the Name field, enter a name for the run.
- In the Description field, enter a description for the run.
- From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Create run.
Verification
- The pipeline run that you created appears in the Runs tab on the Runs page for the pipeline experiment.
4.5. Stopping an active pipeline run
If you no longer require an active pipeline run to continue executing in a pipeline experiment, you can stop the run before its defined end date.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - There is a previously created data science project available that contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An active pipeline run is currently executing.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the active run that you want to stop.
- From the list of pipeline experiments, click the pipeline experiment that contains the run that you want to stop.
In the Runs tab, click the action menu (⋮) beside the active run that you want to stop, and then click Stop.
There might be a short delay while the run stops.
Verification
- In the list of active runs, the status of the run is "stopped".
4.6. Duplicating an active pipeline run
To make it easier to quickly execute pipeline runs with the same configuration in a pipeline experiment, you can duplicate them.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- An active run is available to duplicate in the Active tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the pipeline run that you want to duplicate.
- From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to duplicate.
- Click the action menu (⋮) beside the relevant active run and click Duplicate.
In the Duplicate run page, configure the duplicate run:
- From the Experiment list, select the pipeline experiment that contains the pipeline run that you want to duplicate. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
- From the Pipeline list, select the pipeline to contain the duplicate run. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to contain the duplicate run. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
- In the Parameters section, configure the input parameters for the run that you are duplicating by selecting the appropriate parameters from the list.
- Click Create run.
Verification
- The duplicate pipeline run appears in the Runs tab on the Runs page for the pipeline experiment.
4.7. Viewing scheduled pipeline runs
You can view a list of pipeline runs that are scheduled for execution in a pipeline experiment. From this list, you can view details relating to your pipeline runs, such as the pipeline version that the run belongs to. You can also view the run status, execution frequency, and schedule.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have scheduled a pipeline run that is available to view.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled pipeline runs that you want to view.
- From the list of pipeline experiments, click the experiment that contains the pipeline runs that you want to view.
- On the Runs page, click the Schedules tab.
Study the table showing a list of scheduled runs.
After a run has been scheduled, the run’s status indicates whether the run is ready for execution or unavailable for execution. To change its execution availability, click the run’s Status icon.
Verification
- A list of scheduled runs appears in the Schedules tab on the Runs page for the pipeline experiment.
4.8. Scheduling a pipeline run using a cron job
You can use a cron job to schedule a pipeline run to execute at a specific time. Cron jobs are useful for creating periodic and recurring tasks, and can also schedule individual tasks for a specific time, such as if you want to schedule a run for a low activity period. To successfully execute runs in OpenShift AI, you must use the supported format. See Cron Expression Format for more information.
The following examples show the correct format:
Run occurrence | Cron format |
---|---|
Every five minutes | @every 5m |
Every 10 minutes | 0 */10 * * * * |
Daily at 16:16 UTC | 0 16 16 * * * |
Daily every quarter of the hour | 0 0,15,30,45 * * * * |
On Monday and Tuesday at 15:40 UTC | 0 40 15 * * MON,TUE |
Additional resources
4.9. Scheduling a pipeline run
To repeatedly run a pipeline, you can create a scheduled pipeline run.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the run that you want to schedule.
- From the list of pipeline experiments, click the experiment that contains the run that you want to schedule.
- Click the Schedules tab.
- Click Create schedule.
On the Schedule run page, configure the run that you are scheduling:
- From the Experiment list, select the pipeline experiment that you want to contain the scheduled run. Alternatively, to create a new pipeline experiment, click Create new experiment, and then complete the relevant fields in the Create experiment dialog.
- In the Name field, enter a name for the run.
- In the Description field, enter a description for the run.
From the Trigger type list, select one of the following options:
- Select Periodic to specify an execution frequency. In the Run every field, enter a numerical value and select an execution frequency from the list.
-
Select Cron to specify the execution schedule in
cron
format in the Cron string field. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supportedcron
format, see Cron Expression Format.
- In the Maximum concurrent runs field, specify the number of runs that can execute concurrently, from a range of one to ten.
- For Start date, specify a start date for the run. Select a start date using the Calendar and the start time from the list of times.
- For End date, specify an end date for the run. Select an end date using the Calendar and the end time from the list of times.
- For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
- From the Pipeline list, select the pipeline that you want to create a run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Schedule run.
Verification
- The pipeline run that you scheduled appears in the Schedules tab on the Runs page for the pipeline experiment.
4.10. Duplicating a scheduled pipeline run
To make it easier to schedule runs to execute as part of your pipeline experiment, you can duplicate existing scheduled runs.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- A scheduled run is available to duplicate in the Schedules tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled run that you want to duplicate.
- From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to duplicate.
- On the Runs page, click the Schedules tab.
- Click the action menu (⋮) beside the run that you want to duplicate and click Duplicate.
On the Duplicate schedule page, configure the duplicate run:
- From the Experiment list, select the pipeline experiment that contains the scheduled pipeline run that you want to duplicate. Alternatively, to create a new pipeline experiment, click Create new experiment, and then and complete the relevant fields in the Create experiment dialog.
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
From the Trigger type list, select one of the following options:
- Select Periodic to specify an execution frequency. In the Run every field, enter a numerical value and select an execution frequency from the list.
-
Select Cron to specify the execution schedule in
cron
format in the Cron string field. This creates a cron job to execute the run. Click the Copy button ( ) to copy the cron job schedule to the clipboard. The field furthest to the left represents seconds. For more information about scheduling tasks using the supportedcron
format, see Cron Expression Format.
- For Maximum concurrent runs, specify the number of runs that can execute concurrently, from a range of one to ten.
- For Start date, specify a start date for the duplicate run. Select a start date using the Calendar and the start time from the list of times.
- For End date, specify an end date for the duplicate run. Select an end date using the Calendar and the end time from the list of times.
- For Catch up, enable or disable catch up runs. You can use catch up runs to ensure your pipeline runs do not permanently fall behind schedule when paused. For example, if you re-enable a paused recurring run, the run scheduler backfills each missed run interval.
- From the Pipeline list, select the pipeline that you want to create a duplicate run for. Alternatively, to create a new pipeline, click Create new pipeline, and then complete the relevant fields in the Import pipeline dialog.
- From the Pipeline version list, select the pipeline version to create a duplicate run for. Alternatively, to upload a new version, click Upload new version, and then complete the relevant fields in the Upload new version dialog.
- Configure the input parameters for the run by selecting the parameters from the list.
- Click Schedule run.
Verification
- The pipeline run that you duplicated appears in the Schedules tab on the Runs page for the pipeline experiment.
4.11. Deleting a scheduled pipeline run
To discard pipeline runs that you previously scheduled, but no longer require, you can delete them so that they do not appear on the Schedules page.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously scheduled a run that is available to delete.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the scheduled pipeline run that you want to delete.
- From the list of pipeline experiments, click the experiment that contains the scheduled pipeline run that you want to delete.
- On the Runs page, click the Schedules tab.
- Click the action menu (⋮) beside the scheduled pipeline run that you want to delete, and then click Delete.
- In the Delete schedule dialog, enter the run name in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- The run that you deleted no longer appears on the Schedules tab for the pipeline experiment.
4.12. Viewing the details of a pipeline run
To gain a clearer understanding of your pipeline runs, you can view the details of a previously triggered pipeline run, such as its graph, execution details, and run output.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously triggered a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Pipelines. - On the Pipelines page, select the project that you want to view run details for.
- For a pipeline that you want to view run details for, click Expand ( ).
- Click the action menu (⋮) for the pipeline version and then click View runs.
- On the Runs page, click the name of the run that you want to view the details of.
Verification
- On the Run details page, you can view the run’s graph, execution details, input parameters, step logs, and run output.
4.13. Viewing archived pipeline runs
You can view a list of pipeline runs that you have archived. You can view details for your archived pipeline runs, such as the pipeline version, run status, duration, and execution start date.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived pipeline run exists.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the archived pipeline runs that you want to view.
- From the list of pipeline experiments, click the experiment that contains the archived pipeline runs that you want to view.
- On the Runs page, click the Archive tab.
Verification
- A list of archived runs appears in the Archive tab on the Runs page for the pipeline experiment.
4.14. Archiving a pipeline run
You can retain records of your pipeline runs by archiving them. If required, you can restore runs from your archive to reuse, or delete runs that are no longer required.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously executed a pipeline run that is available.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment for the run that you want to archive.
- From the list of pipeline experiments, click the experiment that contains the pipeline run that you want to archive.
- On the Runs page, click the Runs tab.
- Click the action menu (⋮) beside the pipeline run that you want to archive, and then click Archive.
- In the Archiving run dialog, enter the run name in the text field to confirm that you intend to archive it.
- Click Archive.
Verification
- The archived run does not appear in the Runs tab, and instead appears in the Archive tab on the Runs page for the pipeline experiment.
4.15. Restoring an archived pipeline run
You can restore an archived run to the active state.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived run exists in your project.
Procedure
-
From the OpenShift AI dashboard, click Experiments
Experiments and runs. - On the Experiments page, from the Project drop-down list, select the project that contains the pipeline experiment that you want to restore.
- From the list of pipeline experiments, click the experiment that contains the archived pipeline run that you want to restore.
- On the Runs page, click the Archive tab.
- Click the action menu (⋮) beside the pipeline run that you want to restore, and then click Restore.
- In the Restore run dialog, click Restore.
Verification
- The restored run appears in the Runs tab on the Runs page for the pipeline experiment.
4.16. Deleting an archived pipeline run
You can delete pipeline runs from the OpenShift AI run archive.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and has a pipeline server.
- You have imported a pipeline to an active pipeline server.
- You have previously archived a pipeline run.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project for the archived pipeline run you want to delete.
- In the Run details page, click Archived.
- Click the action menu (⋮) beside the run that you want to delete and click Delete.
- In the Delete run dialog, enter the run name in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- The archived run that you deleted no longer appears in the Archived tab on the Runs page.
4.17. Duplicating an archived pipeline run
To make it easier to reproduce runs with the same configuration as runs in your archive, you can duplicate them.
Prerequisites
- You have logged in to Red Hat OpenShift AI.
-
If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example,
rhoai-users
orrhoai-admins
) in OpenShift. - You have previously created a data science project that is available and contains a configured pipeline server.
- You have imported a pipeline to an active pipeline server.
- An archived run is available to duplicate in the Archived tab on the Runs page.
Procedure
-
From the OpenShift AI dashboard, click Data Science Pipelines
Runs. - On the Runs page, select the project that has the pipeline run that you want to duplicate.
- Click the Archived tab.
- Click the action menu (⋮) beside the relevant archived run and click Duplicate.
On the Duplicate run page, configure the duplicate run:
- In the Name field, enter a name for the duplicate run.
- In the Description field, enter a description for the duplicate run.
- From the Pipeline list, select the pipeline to contain the duplicate run.
- From the Pipeline version list, select the pipeline version to contain the duplicate run.
- In the Parameters section, configure the input parameters for the run that you are duplicating by selecting the appropriate parameters from the list.
- Click Create run.
Verification
- The duplicate pipeline run appears in the Active tab on the Runs page.