Search

Chapter 1. Working on data science projects

download PDF

As a data scientist, you can organize your data science work into a single project. A data science project in OpenShift AI can consist of the following components:

Workbenches
Creating a workbench allows you to add a Jupyter notebook to your project.
Cluster storage
For data science projects that require data to be retained, you can add cluster storage to the project.
Data connections
Adding a data connection to your project allows you to connect data inputs to your workbenches.
Pipelines
Standardize and automate machine learning workflows to enable you to further enhance and deploy your data science models.
Models and model servers
Deploy a trained data science model to serve intelligent applications. Your model is deployed with an endpoint that allows applications to send requests to the model.
Important

If you create an OpenShift project outside of the OpenShift AI user interface, the project is not shown on the Data Science Projects page. In addition, you cannot use features exclusive to OpenShift AI, such as workbenches and model serving, with a standard OpenShift project.

To classify your OpenShift project as a data science project, and to make available features exclusive to OpenShift AI, you must add the label opendatahub.io/dashboard: 'true' to the project namespace. After you add this label, your project is subsequently shown on the Data Science Projects page.

1.1. Using data science projects

1.1.1. Creating a data science project

To implement a data science workflow, you must create a project. In OpenShift, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers and data scientists in your organization.

Within a project, you can add the following functionality:

  • Data connections so that you can access data without having to hardcode information like endpoints or credentials.
  • Workbenches for working with and processing data, and for developing models.
  • Deployed models so that you can test them and then integrate them into intelligent applications. Deploying a model makes it available as a service that you can access by using an API.
  • Pipelines for automating your ML workflow.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.

Procedure

  1. From the OpenShift AI dashboard, select Data Science Projects.
  2. Click Create data science project.
  3. In the Create a data science project dialog, enter a display Name for your project.
  4. Optional: Edit the Resource name for your data science project. The resource name must consist of lowercase alphanumeric characters, -, and must start and end with an alphanumeric character.

    Note: After you create a project, you can change the project display name but you cannot change the resource name.

  5. Enter a description for your data science project.
  6. Click Create.

Verification

  • A project details page opens. From this page, you can add data connections, create workbenches, configure pipelines, and deploy models.

1.1.2. Updating a data science project

You can update your data science project’s details by changing your project’s name and description text.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the action menu () beside the project whose details you want to update and click Edit project.

    The Edit data science project dialog opens.

  3. Optional: Update the name for your data science project.
  4. Optional: Update the description for your data science project.
  5. Click Update.

Verification

  • The data science project that you updated is displayed on the Data Science Projects page.

1.1.3. Deleting a data science project

You can delete data science projects so that they do not appear on the OpenShift AI Data Science Projects page when you no longer want to use them.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, {oai-user-group}) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the action menu () beside the project that you want to delete and then click Delete project.

    The Delete project dialog opens.

  3. Enter the project name in the text field to confirm that you intend to delete it.
  4. Click Delete project.

Verification

  • The data science project that you deleted is no longer displayed on the Data Science Projects page.
  • Deleting a data science project deletes any associated workbenches, data science pipelines, cluster storage, and data connections. This data is permanently deleted and is not recoverable.

1.2. Using project workbenches

1.2.1. Creating a workbench and selecting an IDE

A workbench is an isolated area where you can examine and work with ML models. You can also work with data and run programs, for example to prepare and clean data. While a workbench is not required if, for example, you only want to service an existing model, one is needed for most data science workflow tasks, such as writing code to process data, or training a model.

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). Supported IDEs include JupyterLab, code-server (Technology Preview), and RStudio (Technology Preview).

The IDEs are based on a server-client architecture. Each IDE provides a server that runs in a container on the OpenShift cluster, while the user interface (the client) is displayed in your web browser. For example, the Jupyter notebook server runs in a container on the Red Hat OpenShift cluster. The client is the JupyterLab interface that opens in your web browser on your local computer. All of the commands that you enter in JupyterLab are executed by the notebook server. Similarly, other IDEs like code-server or RStudio Server provide a server that runs in a container on the OpenShift cluster, while the user interface is displayed in your web browser. This architecture allows you to interact through your local computer in a browser environment, while all processing occurs on the cluster. The cluster provides the benefits of larger available resources and security because the data being processed never leaves the cluster.

In a workbench, you also configure data connections (to access external data for training models and to save models so that you can deploy them) and cluster storage (for persisting data). Workbenches within the same project can share models and data through object storage with the data science pipelines and model servers.

For data science projects that require data retention, you can add container storage to the workbench you are creating.

Within a project, you can create multiple workbenches. When to create a new workbench depends on considerations, such as the following:

  • The workbench configuration (for example, CPU, RAM, or IDE). If you want to avoid editing the configuration of an existing workbench’s configuration to accomodate a new task, you can create a new workbench instead.
  • Separation of tasks or activities. For example, you might want to use one workbench for your Large Language Models (LLM) experimentation activities, another workbench dedicated to a demo, and another workbench for testing.

1.2.1.1. About workbench images

A workbench image (sometimes referred to as a notebook image) is optimized with the tools and libraries that you need for model development. You can use the provided workbench images or an OpenShift AI admin user can create custom workbench images adapted to your needs.

To provide a consistent, stable platform for your model development, many provided workbench images contain the same version of Python. Most workbench images available on OpenShift AI are pre-built and ready for you to use immediately after OpenShift AI is installed or upgraded.

For information about Red Hat support of workbench images and packages, see Red Hat OpenShift AI: Supported Configurations.

Red Hat OpenShift AI contains the following notebook images that are available by default.

Important

Notebook images denoted with (Technology Preview) in this table are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using Technology Preview features in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Table 1.1. Default notebook images
Image nameDescription

CUDA

If you are working with compute-intensive data science models that require GPU support, use the Compute Unified Device Architecture (CUDA) notebook image to gain access to the NVIDIA CUDA Toolkit. Using this toolkit, you can optimize your work by using GPU-accelerated libraries and optimization tools.

Standard Data Science

Use the Standard Data Science notebook image for models that do not require TensorFlow or PyTorch. This image contains commonly-used libraries to assist you in developing your machine learning models.

TensorFlow

TensorFlow is an open source platform for machine learning. With TensorFlow, you can build, train and deploy your machine learning models. TensorFlow contains advanced data visualization features, such as computational graph visualizations. It also allows you to easily monitor and track the progress of your models.

PyTorch

PyTorch is an open source machine learning library optimized for deep learning. If you are working with computer vision or natural language processing models, use the Pytorch notebook image.

Minimal Python

If you do not require advanced machine learning features, or additional resources for compute-intensive data science work, you can use the Minimal Python image to develop your models.

TrustyAI

Use the TrustyAI notebook image to leverage your data science work with model explainability, tracing, and accountability, and runtime monitoring.

HabanaAI

The HabanaAI notebook image optimizes high-performance deep learning (DL) with Habana Gaudi devices. Habana Gaudi devices accelerate DL training workloads and maximize training throughput and efficiency.

code-server (Technology Preview)

With the code-server notebook image, you can customize your notebook environment to meet your needs using a variety of extensions to add new languages, themes, debuggers, and connect to additional services. Enhance the efficiency of your data science work with syntax highlighting, auto-indentation, and bracket matching, as well as an automatic task runner for seamless automation. For more information, see code-server in GitHub.

Note

Elyra-based pipelines are not available with the code-server notebook image.

RStudio Server (Technology preview)

Use the RStudio Server notebook image to access the RStudio IDE, an integrated development environment for R, a programming language for statistical computing and graphics. For more information, see the RStudio Server site.

To use the RStudio Server notebook image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

Disclaimer:
Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through https://rstudio.org/ and is subject to RStudio licensing terms. Review the licensing terms before you use this sample workbench.

CUDA - RStudio Server (Technology preview)

Use the CUDA - RStudio Server notebook image to access the RStudio IDE and NVIDIA CUDA Toolkit. RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can optimize your work using GPU-accelerated libraries and optimization tools. For more information, see the RStudio Server site.

To use the CUDA - RStudio Server notebook image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the cuda-rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

Disclaimer:
Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through https://rstudio.org/ and is subject to RStudio licensing terms. Review the licensing terms before you use this sample workbench.

The CUDA - RStudio Server notebook image contains NVIDIA CUDA technology. CUDA licensing information is available at https://docs.nvidia.com/cuda/. Review the licensing terms before you use this sample workbench.

1.2.1.2. Building the RStudio Server workbench images

Important

The RStudio Server and CUDA - RStudio Server workbench images are currently available in Red Hat OpenShift AI as Technology Preview features.

Note

The RStudio Server workbench images are currently unavailable for disconnected environments.

Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Red Hat OpenShift AI includes the following RStudio Server workbench images:

  • RStudio Server workbench image

    With the RStudio Server workbench image, you can access the RStudio IDE, an integrated development environment for the R programming language. R is used for statistical computing and graphics to support data analysis and predictions.

    Important

    Disclaimer: Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through rstudio.org and is subject to their licensing terms. You should review their licensing terms before you use this sample workbench.

  • CUDA - RStudio Server workbench image

    With the CUDA - RStudio Server workbench image, you can access the RStudio IDE and NVIDIA CUDA Toolkit. The RStudio IDE is an integrated development environment for the R programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can enhance your work by using GPU-accelerated libraries and optimization tools.

    Important

    Disclaimer: Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through rstudio.org and is subject to their licensing terms. You should review their licensing terms before you use this sample workbench.

    The CUDA - RStudio Server workbench image contains NVIDIA CUDA technology. CUDA licensing information is available in the CUDA Toolkit documentation. You should review their licensing terms before you use this sample workbench.

To use the RStudio Server and CUDA - RStudio Server workbench images, you must first build them by creating a secret and triggering the BuildConfig, and then enable them in the OpenShift AI UI by editing the rstudio-rhel9 and cuda-rstudio-rhel9 image streams.

Prerequisites

  • Before starting the RStudio Server build process, you have at least 1 CPU and 2Gi memory available for rstudio-server-rhel9, and 1.5 CPUs and 8Gi memory available for cuda-rstudio-server-rhel9 on your cluster.
  • You are logged in to your OpenShift cluster.
  • You have the cluster-admin role in OpenShift Container Platform.
  • You have an active Red Hat Enterprise Linux (RHEL) subscription.

Procedure

  1. Create a secret with Subscription Manager credentials. These are usually your Red Hat Customer Portal username and password.

    Note: The secret must be named rhel-subscription-secret, and its USERNAME and PASSWORD keys must be in capital letters.

    oc create secret generic rhel-subscription-secret --from-literal=USERNAME=<username> --from-literal=PASSWORD=<password> -n redhat-ods-applications
  2. Start the build:

    1. To start the lightweight RStudio Server build:

      oc start-build rstudio-server-rhel9 -n redhat-ods-applications --follow
    2. To start the CUDA-enabled RStudio Server build, trigger the cuda-rhel9 BuildConfig:

      oc start-build cuda-rhel9 -n redhat-ods-applications --follow

      The cuda-rhel9 build is a prerequisite for cuda-rstudio-rhel9. The cuda-rstudio-rhel9 build starts automatically.

  3. Confirm that the build process has completed successfully using the following command. Successful builds appear as Complete.

    oc get builds -n redhat-ods-applications
  4. After the builds complete successfully, use the following commands to make the workbench images available in the OpenShift AI UI.

    1. To enable the RStudio Server workbench image:

      oc label -n redhat-ods-applications imagestream rstudio-rhel9 opendatahub.io/notebook-image='true'
    2. To enable the CUDA - RStudio Server workbench image:

      oc label -n redhat-ods-applications imagestream cuda-rstudio-rhel9 opendatahub.io/notebook-image='true'

Verification

  • You can see RStudio Server and CUDA - RStudio Server images on the Applications Enabled menu in the Red Hat OpenShift AI dashboard.
  • You can see R Studio Server or CUDA - RStudio Server in the Data Science Projects Workbenches Create workbench Notebook image Image selection dropdown list.

1.2.1.3. Creating a workbench

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). You can also configure data connections, cluster storage, and add container storage.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you use specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You created a project.
  • If you created a Simple Storage Service (S3) account outside of Red Hat OpenShift AI and you want to create data connections to your existing S3 storage buckets, you have the following credential information for the storage buckets:

    • Endpoint URL
    • Access key
    • Secret key
    • Region
    • Bucket name

    For information about working with data stored in AWS S3, see Integrating data from Amazon S3.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.
  2. Click the name of the project that you want to add the workbench to.

    A Details page for the project opens.

  3. In the Workbenches section, click Create a workbench.
  4. In the Create workbench page, configure the properties of the workbench that you are creating.

    1. In the Name field, enter a name for your workbench.
    2. Optional: In the Description field, enter a description to define your workbench.
    3. In the Notebook image section, complete the fields to specify the workbench image to use with your workbench.

      From the Image selection list, select a workbench image that suits your use case. A workbench image includes an IDE and Python packages (reusable code). Optionally, click the View package information option to view a list of packages that are included in the image that you selected.

      If the workbench image has multiple versions available, select the workbench image version to use from the Versions section. To use the latest package versions, Red Hat recommends that you use the most recently added image.

      Note

      You can change the workbench image after you create the workbench.

    4. In the Deployment size section, from the Container size list, select a container size for your server. The container size controls the number of CPUs, the amount of memory, and the minimum and maximum request capacity of the container.
    5. Optional: Select and specify values for any environment variables.

      Setting environment variables during the workbench configuration helps you save time later because you do not need to define them in the body of your notebooks, or with the IDE command line interface.

      If you are using S3-compatible storage, add these recommended environment variables:

      • AWS_ACCESS_KEY_ID specifies your Access Key ID for Amazon Web Services.
      • AWS_SECRET_ACCESS_KEY specifies your Secret access key for the account specified in AWS_ACCESS_KEY_ID.

      OpenShift AI stores the credentials as Kubernetes secrets in a protected namespace if you select Secret when you add the variable.

    6. Configure the storage for your workbench. Select one of the following options:

      • Create new persistent storage to create storage that is retained after you shut down your workbench. Complete the relevant fields to define the storage.
      • Use existing persistent storage to reuse existing storage and select the storage from the Persistent storage list.
    7. Optionally, you can add a data connection to your workbench. A data connection is a resource that contains the configuration parameters needed to connect to a data source or an object storage bucket. Currently, only S3-Compatible data connections are supported. You can use storage buckets for storing data, models, and pipeline artifacts. You can also use a data connection to specify the location of a model that you want to deploy.

      In the Data connections section, select the Use a data connection checkbox.

      • Create a new data connection as follows:

        1. Select Create new data connection.
        2. In the Name field, enter a unique name for the data connection.
        3. In the Access key field, enter the access key ID for the S3-compatible object storage provider.
        4. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
        5. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
        6. In the Region field, enter the default region of your S3-compatible object storage account.
        7. In the Bucket field, enter the name of your S3-compatible object storage bucket.
      • Use an existing data connection as follows:

        1. Select Use existing data connection.
        2. From the Data connection list, select a data connection that you previously defined.
  5. Click Create workbench.

Verification

  • The workbench that you created appears on the Workbenches tab for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Cluster storage tab for the project.
  • The Status column on the Workbenches tab displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.
  • Optionally, click the Open link to open the IDE in a new window.

1.2.2. Starting a workbench

You can manually start a data science project’s workbench from the Workbenches tab on the project details page. By default, workbenches start immediately after you create them.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project that contains a workbench.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose workbench you want to start.

    A project details page opens.

  3. Click the Workbenches tab.
  4. Click the toggle in the Status column for the relevant workbench to start a workbench that is not running.

    The status of the workbench that you started changes from Stopped to Running. After the workbench has started, click Open to open the workbench’s notebook.

Verification

  • The workbench that you started appears on the Workbenches tab for the project, with the status of Running.

1.2.3. Updating a project workbench

If your data science work requires you to change your workbench’s notebook image, container size, or identifying information, you can update the properties of your project’s workbench. If you require extra power for use with large datasets, you can assign accelerators to your workbench to optimize performance.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you use specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project that has a workbench.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose workbench you want to update.

    A project details page opens.

  3. Click the Workbenches tab.
  4. Click the action menu () beside the workbench that you want to update and then click Edit workbench.

    The Edit workbench page opens.

  5. Update any of the workbench properties and then click Update workbench.

Verification

  • The workbench that you updated appears on the Workbenches tab for the project.

1.2.4. Deleting a workbench from a data science project

You can delete workbenches from your data science projects to help you remove Jupyter notebooks that are no longer relevant to your work.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project with a workbench.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the workbench from.

    A project details page opens.

  3. Click the Workbenches tab.
  4. Click the action menu () beside the workbench that you want to delete and then click Delete workbench.

    The Delete workbench dialog opens.

  5. Enter the name of the workbench in the text field to confirm that you intend to delete it.
  6. Click Delete workbench.

Verification

  • The workbench that you deleted is no longer displayed in the Workbenches tab for the project.
  • The custom resource (CR) associated with the workbench’s Jupyter notebook is deleted.

1.3. Using data connections

1.3.1. Adding a data connection to your data science project

You can enhance your data science project by adding a connection to a data source. When you want to work with a very large data sets, you can store your data in an S3-compatible object storage bucket, so that you do not fill up your local storage. You also have the option of associating the data connection with an existing workbench that does not already have a connection.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have created a data science project that you can add a data connection to.
  • You have access to S3-compatible object storage.
  • If you intend to add the data connection to an existing workbench, you have saved any data in the workbench to avoid losing work.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add a data connection to.

    A project details page opens.

  3. Click the Data connections tab.
  4. Click Add data connection.

    The Add data connection dialog opens.

  5. Enter a name for the data connection.
  6. In the Access key field, enter the access key ID for your S3-compatible object storage provider.
  7. In the Secret key field, enter the secret access key for the S3-compatible object storage account you specified.
  8. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
  9. In the Region field, enter the default region of your S3-compatible object storage account.
  10. In the Bucket field, enter the name of your S3-compatible object storage bucket.
  11. Optional: From the Connected workbench list, select a workbench to connect.
  12. Click Add data connection.

Verification

  • The data connection that you added appears in the Data connections tab for the project.
  • If you selected a workbench, the workbench is visible in the Connected workbenches column in the Data connections tab for the project.

1.3.2. Deleting a data connection

You can delete data connections from your data science projects to help you remove connections that are no longer relevant to your work.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project with a data connection.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the data connection from.

    A project details page opens.

  3. Click the Data connections tab.
  4. Click the action menu () beside the data connection that you want to delete and then click Delete data connection.

    The Delete data connection dialog opens.

  5. Enter the name of the data connection in the text field to confirm that you intend to delete it.
  6. Click Delete data connection.

Verification

  • The data connection that you deleted is no longer displayed in the Data connections tab for the project.

1.3.3. Updating a connected data source

To use an existing data source with a different workbench, you can change the data source that is connected to your project’s workbench.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project, created a workbench, and you have defined a data connection.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose data source you want to change.

    A project details page opens.

  3. Click the Data connections tab.
  4. Click the action menu () beside the data source that you want to change and then click Edit data connection.

    The Edit data connection dialog opens.

  5. In the Connected workbench section, select an existing workbench from the list.
  6. Click Update data connection.

Verification

  • The updated data connection is displayed in the Data connections tab for the project.
  • You can access your S3 data source using environment variables in the connected workbench.

1.4. Configuring cluster storage

1.4.1. Adding cluster storage to your data science project

For data science projects that require data to be retained, you can add cluster storage to the project. Additionally, you can also connect cluster storage to a specific project’s workbench.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project that you can add cluster storage to.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add the cluster storage to.

    A project details page opens.

  3. Click the Cluster storage tab.
  4. Click Add cluster storage.

    The Add storage dialog opens.

  5. Enter a name for the cluster storage.
  6. Enter a description for the cluster storage.
  7. Under Persistent storage size, enter a new size in gibibytes. The minimum size is 1 GiB, and the maximum size is 16384 GiB.
  8. Optional: Select a workbench from the list to connect the cluster storage to an existing workbench.
  9. If you selected a workbench to connect the storage to, enter the storage directory in the Mount folder field.
  10. Click Add storage.

Verification

  • The cluster storage that you added appears in the Cluster storage tab for the project.
  • A new persistent volume claim (PVC) is created with the storage size that you defined.
  • The persistent volume claim (PVC) is visible as an attached storage in the Workbenches tab for the project.

1.4.2. Updating cluster storage

If your data science work requires you to change the identifying information of a project’s cluster storage or the workbench that the storage is connected to, you can update your project’s cluster storage to change these properties.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project that contains cluster storage.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose storage you want to update.

    A project details page opens.

  3. Click the Cluster storage tab.
  4. Click the action menu () beside the storage that you want to update and then click Edit storage.

    The Edit storage page opens.

  5. Update the storage’s properties.

    1. Update the name for the storage, if applicable.
    2. Update the description for the storage, if applicable.
    3. Increase the Persistent storage size for the storage, if applicable.

      Note that you can only increase the storage size. Updating the storage size restarts the workbench and makes it unavailable for a period of time that is usually proportional to the size change.

    4. Update the workbench that the storage is connected to, if applicable.
    5. If you selected a new workbench to connect the storage to, enter the storage directory in the Mount folder field.
  6. Click Update storage.

If you increased the storage size, the workbench restarts and is unavailable for a period of time that is usually proportional to the size change.

Verification

  • The storage that you updated appears in the Cluster storage tab for the project.

1.4.3. Deleting cluster storage from a data science project

You can delete cluster storage from your data science projects to help you free up resources and delete unwanted storage space.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have created a data science project with cluster storage.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the storage from.

    A project details page opens.

  3. Click the Cluster storage tab.
  4. Click the action menu () beside the storage that you want to delete and then click Delete storage.

    The Delete storage dialog opens.

  5. Enter the name of the storage in the text field to confirm that you intend to delete it.
  6. Click Delete storage.

Verification

  • The storage that you deleted is no longer displayed in the Cluster storage tab for the project.
  • The persistent volume (PV) and persistent volume claim (PVC) associated with the cluster storage are both permanently deleted. This data is not recoverable.

1.5. Configuring access to data science projects

1.5.1. Configuring access to data science projects

To enable you to work collaboratively on your data science projects with other users, you can share access to your project. After creating your project, you can then set the appropriate access permissions from the OpenShift AI user interface.

You can assign the following access permission levels to your data science projects:

  • Admin - Users can modify all areas of a project, including its details (project name and description), components, and access permissions.
  • Edit - Users can modify a project’s components, such as its workbench, but they cannot edit a project’s access permissions or its details (project name and description).

1.5.2. Sharing access to a data science project

To enable your organization to work collaboratively, you can share access to your data science project with other users and groups.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. From the list of data science projects, click the name of the data science project that you want to share access to.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Provide one or more users with access to the project.

    1. In the Users section, click Add user.
    2. In the Name field, enter the user name of the user whom you want to provide access to the project.
    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Users with this access level can edit project details and manage access to the project.
      • Edit: Users with this access level can view and edit project components, such as its workbenches, data connections, and storage.
    4. To confirm your entry, click Confirm ( The Confirm icon ).
    5. Optional: To add an additional user, click Add user and repeat the process.
  5. Provide one or more OpenShift groups with access to the project.

    1. In the Groups section, click Add group.
    2. From the Name list, select a group to provide access to the project.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, an input field is displayed enabling you to configure group permissions.

    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.
      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, data connections, and storage.
    4. To confirm your entry, click Confirm ( The Confirm icon ).
    5. Optional: To add an additional group, click Add group and repeat the process.

Verification

  • Users to whom you provided access to the project can perform only the actions permitted by their access permission level.
  • The Users and Groups sections on the Permissions tab show the respective users and groups that you provided with access to the project.

1.5.3. Updating access to a data science project

To change the level of collaboration on your data science project, you can update the access permissions of users and groups who have access to your project.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins) in OpenShift.
  • You have created a data science project.
  • You have previously shared access to your project with other users or groups.
  • You have administrator permissions or you are the project owner.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Update the user access permissions to the project.

    1. In the Name field, update the user name of the user whom you want to provide access to the project.
    2. From the Permissions list, update the user access permissions by selecting one of the following:

      • Admin: Users with this access level can edit project details and manage access to the project.
      • Edit: Users with this access level can view and edit project components, such as its workbenches, data connections, and storage.
    3. To confirm the update to the entry, click Confirm ( The Confirm icon ).
  5. Update the OpenShift groups access permissions to the project.

    1. From the Name list, update the group that has access to the project by selecting another group from the list.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, you can configure group permissions in the input field that appears.

    2. From the Permissions list, update the group access permissions by selecting one of the following:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.
      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, data connections, and storage.
    3. To confirm the update to the entry, click Confirm ( The Confirm icon ).

Verification

  • The Users and Groups sections on the Permissions tab show the respective users and groups whose project access permissions you changed.

1.5.4. Removing access to a data science project

If you no longer want to work collaboratively on your data science project, you can restrict access to your project by removing users and groups that you previously provided access to your project.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, rhoai-users or rhoai-admins ) in OpenShift.
  • You have created a data science project.
  • You have previously shared access to your project with other users or groups.
  • You have administrator permissions or you are the project owner.

Procedure

  1. From the OpenShift AI dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Click the action menu () beside the user or group whose access permissions you want to revoke and click Delete.

Verification

  • Users whose access you have revoked can no longer perform the actions that were permitted by their access permission level.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.