Chapter 5. Implementing pipelines


5.1. Automating workflows with AI pipelines

Earlier, you used a notebook to train and save your model. Optionally, you can automate these tasks by using Red Hat OpenShift AI pipelines. Pipelines automate the execution of multiple notebooks and Python code. By using pipelines, you can run long training jobs or retrain your models on a schedule without having to manually run them in a notebook.

To explore the pipeline editor, complete the steps in the following procedures to create your own pipeline.

Alternately, you can skip the following procedures and instead run the 4 Train Save.pipeline file.

5.1.1. Creating a pipeline

You can create a pipeline by using the GUI pipeline editor. This pipeline automates the notebook workflow that you used earlier to train a model and save it to S3 storage.

Prerequisites

  • You configured a pipeline server as described in Enabling AI pipelines.
  • If you configured the pipeline server after you created your workbench, you stopped and then started your workbench.

Procedure

  1. Open your workbench’s JupyterLab environment. If the launcher is not visible, click + to open it.

    Pipeline buttons
  2. Click Pipeline Editor.

    Pipeline Editor button

    You have created a blank pipeline.

  3. Set the default runtime image for when you run your notebook or Python code.

    1. In the pipeline editor, click Open Panel.

      Open Panel
    2. Select the Pipeline Properties tab.

      Pipeline Properties Tab
    3. In the Pipeline Properties panel, scroll down to Generic Node Defaults and Runtime Image. Set the value to Runtime | Tensorflow | Cuda | Python 3.12.

      Pipeline Runtime Image0
  4. Select File Save Pipeline.

Verification

  • In the JupyterLab file browser, the pipeline file (for example, untitled.pipeline) appears in your working directory.

5.1.2. Adding nodes to your pipeline

Add some steps, or nodes, in your pipeline for the 1_experiment_train.ipynb and 2_save_model.ipynb notebooks.

Prerequisites

  • You created a pipeline file as described in title

Procedure

  1. From the JupyterLab file-browser panel, drag the 1_experiment_train.ipynb and 2_save_model.ipynb notebooks onto the pipeline canvas.

    Drag and Drop Notebooks
  2. Click the output port of 1_experiment_train.ipynb and drag a connecting line to the input port of 2_save_model.ipynb.

    Connect Nodes
  3. Save the pipeline.

Verification

  • Your pipeline has two nodes.

Set node properties in your pipeline to specify the training file as a dependency.

Note

If you do not set this file dependency, the file is not included in the node when it runs and the training job fails.

Prerequisites

Procedure

  1. Click the 1_experiment_train.ipynb node.

    Select Node 1
  2. In the Properties panel, click the Node Properties tab.
  3. Scroll down to the File Dependencies section and then click Add.

    Add File Dependency
  4. Set the value to data/*.csv which has the data to train your model.
  5. Select the Include Subdirectories option.

    Set File Dependency Value
  6. Save the pipeline.

Verification

  • The training file is a dependency of the first node in your pipeline.

You must set the models/fraud/1/model.onnx file as the output file for both nodes in your pipeline.

Prerequisites

Procedure

  1. Select node 1.
  2. Select the Node Properties tab.
  3. Scroll down to the Output Files section, and then click Add.
  4. Set the value to models/fraud/1/model.onnx.

    Set file dependency value
  5. Repeat steps 2-4 for node 2.
  6. Click Save Pipeline.

Verification

  • In node 1, the notebook creates the models/fraud/1/model.onnx file.
  • In node 2, the notebook uploads the models/fraud/1/model.onnx file to the S3 storage bucket.

5.1.5. Configuring the connection to storage

In node 2, the notebook uploads the model to the S3 storage bucket. You must set the S3 storage bucket keys by using the secret created by the My Storage connection that you set up in Storing data with connections.

You can use this secret in your pipeline nodes without having to save the information in your pipeline code. This is important, for example, if you want to save your pipelines - without any secret keys - to source control.

The name of the secret is my-storage.

Note

If you named your connection something other than My Storage, you can obtain the secret name in the OpenShift AI dashboard by hovering over the help (?) icon in the Connections tab.

My Storage Secret Name

The my-storage secret includes the following fields:

  • AWS_ACCESS_KEY_ID
  • AWS_DEFAULT_REGION
  • AWS_S3_BUCKET
  • AWS_S3_ENDPOINT
  • AWS_SECRET_ACCESS_KEY

You must set the secret name and key for each of these fields.

Prerequisites

Procedure

  1. Remove any pre-filled environment variables.

    1. Select node 2, and then select the Node Properties tab.

      Under Additional Properties, note that some environment variables have been pre-filled. The pipeline editor inferred that you need them from the notebook code.

      Because you do not want to save the value in your pipelines, remove all of these environment variables.

    2. Click Remove for each of the pre-filled environment variables.

      Remove Env Var
  2. Add the S3 bucket and keys by using the Kubernetes secret.

    1. Under Kubernetes Secrets, click Add.

      Add Kubernetes Secret
    2. Enter the following values and then click Add.

      • Environment Variable: AWS_ACCESS_KEY_ID

        • Secret Name: my-storage
        • Secret Key: AWS_ACCESS_KEY_ID

          Secret Form
  3. Repeat Step 2 for each of the following Kubernetes secrets:

    • Environment Variable: AWS_SECRET_ACCESS_KEY

      • Secret Name: my-storage
      • Secret Key: AWS_SECRET_ACCESS_KEY
    • Environment Variable: AWS_S3_ENDPOINT

      • Secret Name: my-storage
      • Secret Key: AWS_S3_ENDPOINT
    • Environment Variable: AWS_DEFAULT_REGION

      • Secret Name: my-storage
      • Secret Key: AWS_DEFAULT_REGION
    • Environment Variable: AWS_S3_BUCKET

      • Secret Name: my-storage
      • Secret Key: AWS_S3_BUCKET
  4. Select File Save Pipeline As to save and rename the pipeline. For example, rename it to My Train Save.pipeline.

Verification

  • In the Node Properties tab for node 2, the Kubernetes Secrets section lists all five environment variables configured with the my-storage secret.

5.1.6. Running your pipeline

You can upload and run your pipeline directly from the pipeline editor. Use either your newly created pipeline or the provided 4 Train Save.pipeline file.

Prerequisites

Procedure

  1. In the JupyterLab pipeline editor toolbar, click Run Pipeline.

    Pipeline Run Button
  2. Enter a name for your pipeline.
  3. Verify that the Runtime Configuration: is set to Pipeline.
  4. Click OK.

    NOTE: If you see an error message stating that "no runtime configuration for Pipelines is defined", you might have created your workbench before the pipeline server was available. To address this error, you must verify that you configured the pipeline server and then restart the workbench.

  5. In the OpenShift AI dashboard, open your project and expand the newly created pipeline.

    New pipeline expanded
  6. Click View runs.

    View runs for selected pipeline
  7. Click your run and then view the pipeline run in progress.

    Pipeline run progress

Verification

  • The models/fraud/1/model.onnx file is in your S3 bucket.

Next step

5.2. Running a pipeline generated from Python code

Earlier, you created a simple pipeline by using the GUI pipeline editor. You might want to create pipelines by using code that can be version-controlled and shared with others. The Kubeflow pipelines (kfp) SDK provides a Python API for creating pipelines. The SDK is available as a Python package. With this package, you can use Python code to create a pipeline and then compile it to YAML format. Then you can import the YAML code into OpenShift AI.

This tutorial does not describe the details of how to use the SDK. Instead, it provides the files for you to view and upload.

Procedure

  1. Optionally, view the provided Python code in your JupyterLab environment by navigating to the fraud-detection-notebooks project’s pipeline directory. It contains the following files:

    • 5_get_data_train_upload.py is the main pipeline code.
    • build.sh is a script that builds the pipeline and creates the YAML file.

      For your convenience, the output of the build.sh script is provided in the 5_get_data_train_upload.yaml file. The 5_get_data_train_upload.yaml output file is located in the top-level fraud-detection directory.

  2. Right-click the 5_get_data_train_upload.yaml file and then click Download.
  3. Upload the 5_get_data_train_upload.yaml file to OpenShift AI.

    1. In the OpenShift AI dashboard, navigate to your project page. Click the Pipelines tab and then click Import pipeline.
    2. Enter values for Pipeline name and Pipeline description.
    3. Click Upload and then select 5_get_data_train_upload.yaml from your local files to upload the pipeline.

      Import pipeline
    4. Click Import pipeline to import and save the pipeline.

      The pipeline shows in graphic view.

      Graphic view of pipeline
  4. Select Actions Create run.
  5. On the Create run page, provide the following values:

    1. For Experiment, leave the value as Default.
    2. For Name, type any name, for example Run 1.
    3. For Pipeline, select the pipeline that you uploaded.

      You can leave the other fields with their default values.

      Create Pipeline Run form
  6. Click Create run to create the run.

Verification

  • A new run starts immediately.

    New pipeline run
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top