Chapter 2. Setting up a project and storage


2.2. Setting up your project

To implement a data science workflow, you must create a project as described in the following procedure. Projects help your team to organize and work together on resources within separated namespaces. From a project, you can create many workbenches, each with its own IDE environment (for example, JupyterLab), and each with its own connections and cluster storage. In addition, the workbenches can share models and data with pipelines and model servers.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.

Procedure

  1. From the left navigation menu, select Projects. This page lists any existing projects that you have access to. You can select an existing project (if any) or create a new one.
  2. Click Create project.

    Note

    You can start a Jupyter notebook by clicking the Start basic workbench button. However, in that case, it is a one-off Jupyter notebook run in isolation.

  3. In the Create project modal, enter a display name and description.

    New project form
  4. Click Create.

Verification

Your project opens in the dashboard.

You can click the tabs to view more information about the project components and project access permissions:

+ image::projects/ds-project-new.png[New project]

  • Workbenches are instances of your development and experimentation environment. They typically contain individual development environments (IDEs), such as JupyterLab, RStudio, and Code Server.
  • Pipelines are a structured series of processes that collect, process, analyze, and visualize data. With AI pipelines, you can automate the execution of notebooks and Python code. By using pipelines, you can run long training jobs or retrain your models on a schedule without having to manually run them in a notebook.
  • Deployments for quickly serving a trained model. A model server is a container image for a machine learning model. It exposes APIs to receive data, run the data through a trained model, and delivers a result (for example, a fraud alert).
  • Cluster storage is a persistent volume that retains the files and data you’re working on within a workbench. A workbench has access to one or more cluster storage instances.
  • Connections contain object data which you can use for purposes such as configuration parameters and storing models, data, or artifacts.
  • Permissions define which users and groups can access the project.

2.3. Storing data with connections

Add connections to workbenches to connect your project to data inputs and object storage buckets. A connection is a resource that has the configuration parameters needed to connect to a data source or data sink, such as an AWS S3 object storage bucket.

For this tutorial, you run a provided script that creates the following local MinIO storage buckets for you:

  • My Storage - Use this bucket for storing your models and data. You can reuse this bucket and its connection for your notebooks and model servers.
  • Pipelines Artifacts - Use this bucket as storage for your pipeline artifacts. When you create a pipeline server, you need a pipeline artifacts bucket. For this tutorial, create this bucket to separate it from the first storage bucket for clarity.
Note

Although you can use one storage bucket for both storing models and data and for storing pipeline artifacts, this tutorial follows the recommended practice of using separate storage buckets for each purpose.

The provided script also creates a connection to each storage bucket.

To run the script that installs local MinIO storage buckets and creates connections to them, follow the steps in Running a script to install local object storage buckets and create connections.

Note

If you want to use your own S3-compatible object storage buckets (instead of using the provided script), follow the steps in Creating connections to your own S3-compatible object storage.

If you do not have your own S3-compatible storage or if you want to use a disposable local MinIO instance instead, run a script (provided in the following procedure) that automatically completes these tasks:

  • Creates a MinIO instance in your project.
  • Creates two storage buckets in that MinIO instance.
  • Generates a random user id and password for your MinIO instance.
  • Creates two connections in your project, one for each bucket and both using the same credentials.
  • Installs required network policies for service mesh functionality.

This script is based on the guide for deploying MinIO.

Important

The MinIO-based Object Storage that the script creates is not meant for production usage.

Note

If you want to connect to your own storage, see Creating connections to your own S3-compatible object storage.

Prerequisites

  • You must know the OpenShift resource name for your project so that you run the provided script in the correct project. To get the project’s resource name:

    In the OpenShift AI dashboard, select Projects and then click the ? icon next to the project name. A text box opens with information about the project, including its resource name:

    Project list resource name
Note

The following procedure describes how to run the script from the OpenShift console. If you are knowledgeable in OpenShift and can access the cluster from the command line, instead of following the steps in this procedure, you can use the following command to run the script:

oc apply -n <your-project-name> -f https://github.com/rh-aiservices-bu/fraud-detection/raw/v3.2/setup/setup-s3-no-sa.yaml
Copy to Clipboard Toggle word wrap

Procedure

  1. In the OpenShift AI dashboard, click the application launcher icon and then select the OpenShift Console option.

    OpenShift Console Link
  2. In the OpenShift console, click + in the top navigation bar, and then click Import YAML.

    Add resources Icon
  3. Click the down arrow next to the project name, and then select your project from the list of projects. If needed, type the name of your project in the Select project search field.

    Select a project
  4. Verify that you selected the correct project.

    Selected project
  5. Copy the following code and paste it into the Import YAML editor.

    Note

    This code gets and applies the setup-s3-no-sa.yaml file.

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: demo-setup
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: demo-setup-edit
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: edit
    subjects:
      - kind: ServiceAccount
        name: demo-setup
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: create-s3-storage
    spec:
      selector: {}
      template:
        spec:
          containers:
            - args:
                - -ec
                - |-
                  echo -n 'Setting up MinIO instance and connections'
                  oc apply -f https://github.com/rh-aiservices-bu/fraud-detection/raw/v3.2/setup/setup-s3-no-sa.yaml
              command:
                - /bin/bash
              image: image-registry.openshift-image-registry.svc:5000/openshift/tools:latest
              imagePullPolicy: IfNotPresent
              name: create-s3-storage
          restartPolicy: Never
          serviceAccount: demo-setup
          serviceAccountName: demo-setup
    Copy to Clipboard Toggle word wrap
  6. Click Create.

Verification

  1. In the OpenShift console, there is a "Resources successfully created" message and a list of the following resources:

    • demo-setup
    • demo-setup-edit
    • create-s3-storage
  2. In the OpenShift AI dashboard:

    1. Select Projects and then click the name of your project, Fraud detection.
    2. Click Connections. There are two connections listed: My Storage and Pipeline Artifacts.

      Connections for Fraud Detection
Important

If your cluster uses self-signed certificates, your OpenShift AI administrator might need to configure a certificate authority (CA) to securely connect to the S3 object storage, as described in Accessing S3-compatible object storage with self-signed certificates (Self-Managed).

Next steps

  • Decide whether you want to complete the pipelines section of this tutorial. With OpenShift AI pipelines, you can automate the execution of your notebooks and Python code. You can run long training jobs or retrain your models on a schedule without having to manually run them in a notebook.

    If you want to complete the pipelines section of this tutorial, go to Enabling AI pipelines.

  • Decide whether you want to complete the Distributing training jobs with the Training Operator section of this tutorial. In that section, you implement a distributed training job by using Kueue for managing job resources.

    If you want to complete the Distributing training jobs with the Training Operator section of this tutorial, go to Setting up Kueue resources.

  • Otherwise, skip to Creating a workbench.
Note

Skip this procedure if you completed the steps in Running a script to install local object storage buckets and create connections.

If you have existing S3-compatible storage buckets that you want to use for this tutorial, you must create a connection to one storage bucket for saving your data and models. If you want to complete the pipelines section of this tutorial, create another connection to a different storage bucket for saving pipeline artifacts.

Prerequisites

To create connections to your existing S3-compatible storage buckets, you need the following credential information for the storage buckets:

  • Endpoint URL
  • Access key
  • Secret key
  • Region
  • Bucket name

If you do not have this information, contact your storage administrator.

Procedure

  1. Create a connection for saving your data and models:

    1. In the OpenShift AI dashboard, navigate to the page for your project.
    2. Click the Connections tab, and then click Create connection.

      Add connection
    3. In the Add connection modal, for the Connection type select S3 compatible object storage - v1.
    4. Complete the Add connection form and name your connection My Storage. This connection is for saving your personal work, including data and models.

      Add my storage form
    5. Click Create.
  2. Create a connection for saving pipeline artifacts:

    Note

    If you do not intend to complete the pipelines section of the tutorial, you can skip this step.

    1. Click Add connection.
    2. Complete the form and name your connection Pipeline Artifacts.

      Add pipeline artifacts form
    3. Click Create.

Verification

In the Connections tab for the project, check to see that your connections are listed.

List of project connections
Important

If your cluster uses self-signed certificates, your OpenShift AI administrator might need to provide a certificate authority (CA) to securely connect to the S3 object storage, as described in Accessing S3-compatible object storage with self-signed certificates (Self-Managed).

Next steps

  • Decide whether you want to complete the pipelines section of this tutorial. With OpenShift AI pipelines, you can automate the execution of your notebooks and Python code. You can run long training jobs or retrain your models on a schedule without having to manually run them in a notebook.

    If you want to complete the pipelines section of this tutorial, go to Enabling AI pipelines.

  • Decide whether you want to complete the Distributing training jobs with the Training Operator section of this tutorial. In that section, you implement a distributed training job by using Kueue for managing job resources.

    If you want to complete the Distributing training jobs with the Training Operator section of this tutorial, go to Setting up Kueue resources.

  • Otherwise, skip to Creating a workbench.

2.4. (Optional) Enabling AI pipelines

You must prepare your tutorial environment to use ai pipelines.

Note

If you do not intend to complete the pipelines section of this tutorial, you can skip to the next section, Setting up Kueue resources.

Later in this tutorial, you implement an example pipeline by using the JupyterLab Elyra extension. Elyra enables you to create a visual, end-to-end pipeline workflow that runs in OpenShift AI.

Prerequisites

Procedure

  1. In the OpenShift AI dashboard, on the Fraud Detection page, click the Pipelines tab.
  2. Click Configure pipeline server.

    Create pipeline server button
  3. In the Configure pipeline server form, click Autofill from connection and then click Pipeline Artifacts.

    The Configure pipeline server form fills with credentials for the connection.

    Selecting the Pipeline Artifacts connection
  4. In the Advanced Settings section, leave the default values.
  5. Click Configure pipeline server.
  6. Wait until the loading spinner disappears and Start by importing a pipeline is displayed.

    Important

    You must wait until the pipeline configuration is complete before you continue and create your workbench. If you create your workbench before the pipeline server is ready, your workbench cannot submit pipelines to it.

    1. You can click View progress and event logs.

      If you have waited more than 5 minutes, and the pipeline server configuration does not complete, you can click Cancel pipeline server setup and create it again.

      You can also ask your OpenShift AI administrator to verify that they applied self-signed certificates on your cluster as described in Working with certificates (Self-Managed).

Verification

  1. Navigate to the Pipelines tab for the project.
  2. Next to Import pipeline, click the action menu (⋮) and then select Manage pipeline server configuration.

    Manage pipeline server configuration

    An information box opens and displays the object storage connection information for the pipeline server.

Next step

If you want to complete the Distributing training jobs with the Training Operator section of this tutorial, go to Setting up Kueue resources.

Otherwise, skip to Creating a workbench.

2.5. (Optional) Setting up Kueue resources

Distributed training in OpenShift AI uses the Red Hat build of Kueue for admission and scheduling. Before you run the Ray or Training Operator examples in this tutorial, a cluster administrator must install and configure the Red Hat build of Kueue Operator as described in Kueue workflow.

Also, you must prepare your tutorial environment so that you can use Kueue for distributing training with the Training Operator.

In the Distributing training jobs with the Training Operator section of this tutorial, you implement a distributed training job by using Kueue for managing job resources. With Kueue, you can manage cluster resource quotas and how different workloads consume them.

Note

If you do not intend to use Kueue to schedule your training jobs in the Distributing training jobs with the Training Operator section of this tutorial, skip this procedure and continue to the next section, Creating a workbench and selecting a workbench image.

Procedure

  1. In the OpenShift AI dashboard, click the application launcher icon and then select the OpenShift Console option.

    OpenShift Console Link
  2. In the OpenShift console, click + in the top navigation bar, and then click Import YAML.

    Add resources Icon
  3. Click the down arrow next to the project name, and then select your project from the list of projects. If needed, type the name of your project in the Select project search field.

    Select a project
  4. Verify that you selected the correct project.

    Selected project
  5. Copy the following code and paste it into the Import YAML editor.

    ---
    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ResourceFlavor
    metadata:
      name: default-flavor
    ---
    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ClusterQueue
    metadata:
      name: cluster-queue
    spec:
      namespaceSelector: {}
      resourceGroups:
      - coveredResources: ["cpu", "memory"]
        flavors:
        - name: "default-flavor"
          resources:
          - name: "cpu"
            nominalQuota: 4
          - name: "memory"
            nominalQuota: 8Gi
    ---
    apiVersion: kueue.x-k8s.io/v1beta1
    kind: LocalQueue
    metadata:
      name: local-queue
    spec:
      clusterQueue: cluster-queue
    Copy to Clipboard Toggle word wrap
  6. Click Create.

Verification

  • The OpenShift console displays a "Resources successfully created" message with a list of the following resources:
  • default-flavor
  • cluster-queue
  • local-queue
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top