Chapter 5. Creating a project workbench


To examine and work with data models in an isolated area, you can create a workbench. This workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data to be retained, you can add container storage to the workbench you are creating.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add a workbench to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    The Details page for the project opens.

  3. Click Create workbench in the Workbenches section.

    The Create workbench page opens.

  4. Configure the properties of the workbench you are creating.

    1. Enter a name for your workbench.
    2. Enter a description for your workbench.
    3. Select the notebook image to use for your workbench server.
    4. Select the container size for your server.
    5. Optional: Select and specify values for any new environment variables.

      Note

      To enable data science pipelines in JupyterLab in self-managed deployments, create the following environment variable: PIPELINES_SSL_SA_CERTS=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

    6. Configure the storage for your OpenShift Data Science cluster.

      1. Select Create new persistent storage to create storage that is retained after you log out of OpenShift Data Science. Fill in the relevant fields to define the storage.
      2. Select Use existing persistent storage to reuse existing storage then select the storage from the Persistent storage list.
  5. Click Create workbench.

Verification

  • The workbench that you created appears on the Details page for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Details page for the project.
  • The Status column, located in the Workbenches section of the Details page, displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

5.1. Launching Jupyter and starting a notebook server

Launch Jupyter and start a notebook server to start working with your notebooks.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • You know the names and values you want to use for any environment variables in your notebook server environment, for example, AWS_SECRET_ACCESS_KEY.
  • If you want to work with a very large data set, work with your administrator to proactively increase the storage capacity of your notebook server.

Procedure

  1. Locate the Jupyter card on the Enabled applications page.
  2. Click Launch application.

    If you see an Access permission needed message, you are not in the default user group or the default administrator group for OpenShift Data Science. Contact your administrator so that they can add you to the correct group using Adding users for OpenShift Data Science.

    If you have not previously authorized the jupyter-nb-<username> service account to access your account, the Authorize Access page appears prompting you to provide authorization. Inspect the permissions selected by default, and click the Allow selected permissions button.

    If you credentials are accepted, the Notebook server control panel opens displaying the Start a notebook server page.

  3. Start a notebook server.

    This is not required if you have previously opened Jupyter.

    1. Select the Notebook image to use for your server.
    2. If the notebook image contains multiple versions, select the version of the notebook image from the Versions section.

      Note

      When a new version of a notebook image is released, the previous version remains available and supported on the cluster. This gives you time to migrate your work to the latest version of the notebook image.

    3. Select the Container size for your server.
    4. Optional: Select the Number of GPUs (graphics processing units) for your server.

      Important

      Using GPUs to accelerate workloads is only supported with the PyTorch, TensorFlow, and CUDA notebook server images. In addition, you can specify the number of GPUs required for your notebook server only if GPUs are enabled on your cluster. To learn how to enable GPU support, see Enabling GPU support in OpenShift Data Science.

    5. Optional: Select and specify values for any new Environment variables.

      The interface stores these variables so that you only need to enter them once. Example variable names for common environment variables are automatically provided for frequently integrated environments and frameworks, such as Amazon Web Services (AWS).

      Important

      Ensure that you select the Secret checkbox for any variables with sensitive values that must be kept private, such as passwords.

    6. Optional: Select the Start server in current tab checkbox if necessary.
    7. Click Start server.

      The Starting server progress indicator appears. Click Expand event log to view additional information about the server creation process. Depending on the deployment size and resources you requested, starting the server can take up to several minutes. Click Cancel to cancel the server creation.

      After the server starts, you see one of the following behaviors:

      • If you previously selected the Start server in current tab checkbox, the JupyterLab interface opens in the current tab of your web browser.
      • If you did not previously select the Start server in current tab checkbox, the Starting server dialog box prompts you to open the server in a new browser tab or in the current tab.

        The JupyterLab interface opens according to your selection.

Verification

  • The JupyterLab interface opens.

Troubleshooting

  • If you see the "Unable to load notebook server configuration options" error message, contact your administrator so that they can review the logs associated with your Jupyter pod and determine further details about the problem.

5.2. Options for notebook server environments

When you start Jupyter for the first time, or after stopping your notebook server, you must select server options in the Start a notebook server wizard so that the software and variables that you expect are available on your server. This section explains the options available in the Start a notebook server wizard in detail.

The Start a notebook server page is divided into several sections:

Notebook image

Specifies the container image that your notebook server is based on. Different notebook images have different packages installed by default. If the notebook image contains multiple versions, you can select the notebook image version to use from the Versions section.

Note

Notebook images are supported for a minimum of one year. Major updates to pre-configured notebook images occur approximately every six months. Therefore, two supported notebook images are typically available at any given time. To use the latest package versions, Red Hat recommends that you use the most recently added notebook image.

After you start a notebook image, you can check which Python packages are installed on your notebook server and which version of the package you have by running the pip tool in a notebook cell.

The following table shows the package versions used in the available notebook images:

Table 5.1. Notebook image options
Image nameImage versionPreinstalled packages

CUDA

2 (Recommended)

  • Python 3.9
  • CUDA 11.8
  • JupyterLab 3.5
  • Notebook 6.5

1

  • Python 3.8
  • CUDA 11.4
  • JupyterLab 3.2
  • Notebook 6.4

Minimal Python (default)

2 (Recommended)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5

1

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4

PyTorch

2 (Recommended)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • PyTorch 1.13
  • CUDA 11.7
  • TensorBoard 2.11
  • Boto3 1.26
  • Kafka-Python 2.0
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10

1

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • PyTorch 1.8
  • CUDA 10.2
  • TensorBoard 2.6
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6

Standard Data Science

2 (Recommended)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • Boto3 1.26
  • Kafka-Python 2.0
  • Matplotlib 3.6
  • Pandas 1.5
  • Numpy 1.24
  • Scikit-learn 1.2
  • SciPy 1.10

1

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Pandas 1.2
  • Numpy 1.19
  • Scikit-learn 0.24
  • SciPy 1.6

TensorFlow

2 (Recommended)

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • TensorFlow 2.11
  • TensorBoard 2.11
  • CUDA 11.8
  • Boto3 1.26
  • Kafka-Python 2.0
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10

1

  • Python 3.8
  • JupyterLab 3.2
  • Notebook 6.4
  • TensorFlow 2.7
  • TensorBoard 2.6
  • CUDA 11.4
  • Boto3 1.17
  • Kafka-Python 2.0
  • Matplotlib 3.4
  • Numpy 1.19
  • Pandas 1.2
  • Scikit-learn 0.24
  • SciPy 1.6

TrustyAI

1

  • Python 3.9
  • JupyterLab 3.5
  • Notebook 6.5
  • TrustyAI 0.2
  • Boto3 1.26
  • Kafka-Python 2.0
  • Matplotlib 3.6
  • Numpy 1.24
  • Pandas 1.5
  • Scikit-learn 1.2
  • SciPy 1.10
Deployment size

Specifies the compute resources available on your notebook server.

Container size controls the number of CPUs, the amount of memory, and the minimum and maximum request capacity of the container.

Number of GPUs specifies the number of graphics processing units attached to the container.

Important

Using GPUs to accelerate workloads is only supported with the PyTorch, TensorFlow, and CUDA notebook server images. In addition, you can specify the number of GPUs required for your notebook server only if GPUs are enabled on your cluster. To learn how to enable GPU support, see Enabling GPU support in OpenShift Data Science.

Environment variables

Specifies the name and value of variables to be set on the notebook server. Setting environment variables during server startup means that you do not need to define them in the body of your notebooks, or with the Jupyter command line interface. Some recommended environment vairables are shown in the table.

Table 5.2. Recommended environment variables
Environment variable optionRecommended variable names

AWS

  • AWS_ACCESS_KEY_ID specifies your Access Key ID for Amazon Web Services.
  • AWS_SECRET_ACCESS_KEY specifies your Secret access key for the account specified in AWS_ACCESS_KEY_ID.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.