Chapter 3. Working on data science projects


As a data scientist, you can organize your data science work into a single project. A data science project in OpenShift Data Science can consist of the following components:

Workbenches
Creating a workbench allows you to add a Jupyter notebook to your project.
Cluster storage
For data science projects that require data to be retained, you can add cluster storage to the project.
Data connections
Adding a data connection to your project allows you to connect data inputs to your workbenches.
Pipelines
Standardize and automate machine learning workflows to enable you to further enhance and deploy your data science models.
Models and model servers
Deploy a trained data science model to serve intelligent applications. Your model is deployed with an endpoint that allows applications to send requests to the model.
Important

If you create an OpenShift project outside of the OpenShift Data Science user interface, the project is not shown on the Data science projects page. In addition, you cannot use features exclusive to OpenShift Data Science, such as workbenches and model serving, with a standard OpenShift project.

To classify your OpenShift project as a data science project, and to make available features exclusive to OpenShift Data Science, you must add the label opendatahub.io/dashboard: 'true' to the project namespace. After you add this label, your project is subsequently shown on the Data science projects page.

3.1. Using data science projects

3.1.1. Creating a data science project

To start your data science work, create a data science project. Creating a project helps you organize your work in one place. You can also enhance the capabilities of your data science project by adding workbenches, adding storage to your project’s cluster, adding data connections, and adding model servers.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click Create data science project.

    The Create a data science project dialog opens.

  3. Enter a name for your data science project.
  4. Optional: Edit the resource name for your data science project. The resource name must consist of lowercase alphanumeric characters, -, and must start and end with an alphanumeric character.
  5. Enter a description for your data science project.
  6. Click Create.

    A project details page opens. From here, you can create workbenches, add cluster storage, and add data connections to your project.

Verification

  • The data science project that you created is displayed on the Data science projects page.

3.1.2. Updating a data science project

You can update your data science project’s details by changing your project’s name and description text.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the action menu () beside the project whose details you want to update and click Edit project.

    The Edit data science project dialog opens.

  3. Optional: Update the name for your data science project.
  4. Optional: Update the description for your data science project.
  5. Click Update.

Verification

  • The data science project that you updated is displayed on the Data science projects page.

3.1.3. Deleting a data science project

You can delete data science projects so that they do not appear on the OpenShift Data Science Data science projects page when you no longer want to use them.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the action menu () beside the project that you want to delete and click Delete project.

    The Delete project dialog opens.

  3. Enter the project name in the text field to confirm that you intend to delete it.
  4. Click Delete project.

Verification

  • The data science project that you deleted is no longer displayed on the Data science projects page.
  • Deleting a data science project deletes any associated workbenches, cluster storage, and data connections. This data is permanently deleted and is not recoverable.

3.2. Using project workbenches

3.2.1. Creating a project workbench

To examine and work with data models in an isolated area, you can create a workbench. This workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data to be retained, you can add container storage to the workbench you are creating.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add a workbench to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    The Details page for the project opens.

  3. Click Create workbench in the Workbenches section.

    The Create workbench page opens.

  4. Configure the properties of the workbench you are creating.

    1. Enter a name for your workbench.
    2. Enter a description for your workbench.
    3. Select the notebook image to use for your workbench server.
    4. Select the container size for your server.
    5. Optional: Select and specify values for any new environment variables.

      Note

      To enable data science pipelines in JupyterLab in self-managed deployments, create the following environment variable: PIPELINES_SSL_SA_CERTS=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

    6. Configure the storage for your OpenShift Data Science cluster.

      1. Select Create new persistent storage to create storage that is retained after you log out of OpenShift Data Science. Fill in the relevant fields to define the storage.
      2. Select Use existing persistent storage to reuse existing storage then select the storage from the Persistent storage list.
  5. Click Create workbench.

Verification

  • The workbench that you created appears on the Details page for the project.
  • Any cluster storage that you associated with the workbench during the creation process appears on the Details page for the project.
  • The Status column, located in the Workbenches section of the Details page, displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

3.2.2. Starting a workbench

You can manually start a data science project’s workbench from the Details page for the project. By default, workbenches start immediately after you create them.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project that contains a workbench.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project whose workbench you want to start.

    The Details page for the project opens.

  3. Click the toggle in the Status column for the relevant workbench to start a workbench that is not running.

    The status of the workbench that you started changes from Stopped to Running. After the workbench has started, click Open to open the workbench’s notebook.

Verification

  • The workbench that you started appears on the Details page for the project with the status of Running.

3.2.3. Updating a project workbench

If your data science work requires you to change your workbench’s notebook image, container size, or identifying information, you can modify the properties of your project’s workbench.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project that contains a workbench.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project whose workbench you want to update.

    The Details page for the project opens.

  3. Click the action menu () beside the workbench that you want to update in the Workbenches section and click Edit workbench.

    The Edit workbench page opens.

  4. Update the workbench’s properties.

    1. Update the name for your workbench, if applicable.
    2. Update description for your workbench, if applicable.
    3. Select a new notebook image to use for your workbench server, if applicable.
    4. Select a new container size for your server, if applicable.
  5. Click Update workbench.

Verification

  • The workbench that you updated appears on the Details page for the project.

3.2.4. Deleting a workbench from a data science project

You can delete workbenches from your data science projects to help you remove Jupyter notebooks that are no longer relevant to your work.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project with a workbench.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to delete the workbench from.

    The Details page for the project opens.

  3. Click the action menu () beside the workbench that you want to delete in the Workbenches section and click Delete workbench.

    The Delete workbench dialog opens.

  4. Enter the name of the workbench in the text field to confirm that you intend to delete it.
  5. Click Delete workbench.

Verification

  • The workbench that you deleted is no longer displayed in the Workbenches section on the project Details page.
  • The custom resource (CR) associated with the workbench’s Jupyter notebook is deleted.

3.3. Using data connections

3.3.1. Adding a data connection to your data science project

You can enhance your data science project by adding a connection to a data source. When you want to work with a very large data sets, you can store your data in an Amazon Web Services (AWS) Simple Storage Service (S3) bucket so that you do not fill up your local storage. You also have the option of associating the data connection with an existing workbench that does not already have a connection.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add a data connection to.
  • If you intend to add the data connection to an existing workbench, you have saved any data in the workbench to avoid losing work.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add a data connection to.

    A project details page opens.

  3. In the Data connections section of the page, click Add data connection.

    The Add data connection dialog opens.

  4. Enter a name for the data connection.
  5. Enter your access key ID for Amazon Web Services in the AWS_ACCESS_KEY_ID field.
  6. Enter your secret access key for the account you specified in the AWS_SECRET_ACCESS_KEY_ID field.
  7. Enter the endpoint of your AWS S3 storage in the AWS_S3_ENDPOINT field.
  8. Enter the default region of your AWS account in the AWS_DEFAULT_REGION field.
  9. Enter the name of the AWS S3 bucket in the AWS_S3_BUCKET field.
  10. Click Add data connection.

Verification

  • The data connection that you added appears in the Data connections section on the Details page for the project.
  • If you selected a workbench, the data connection is visible in the Workbenches section on your data science project page.

3.3.2. Deleting a data connection

You can delete data connections from your data science projects to help you remove connections that are no longer relevant to your work.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project with a data connection.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to delete the data connection from.

    The Details page for the project opens.

  3. Click the action menu () beside the data connection that you want to delete in the Data connections section and click Delete data connection.

    The Delete data connection dialog opens.

  4. Enter the name of the data connection in the text field to confirm that you intend to delete it.
  5. Click Delete data connection.

Verification

  • The data connection that you deleted is no longer displayed in the Data connections section on the project Details page.

3.3.3. Updating a connected data source

To use an existing data source with a different workbench, you can change the data source that is connected to your project’s workbench.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project, created a workbench, and you have defined a data connection.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project whose data source you want to change.

    A project details page opens.

  3. Click the action menu () beside the data source that you want to change in the Data connections section and click Change connected workbenches.

    The Update connected workbenches dialog opens.

  4. Select an existing workbench to connect the data source to from the list.
  5. Click Update connected workbenches.

Verification

  • The data connection that you changed is displayed in the Data connections section on the project Details page.
  • You can access your S3 data source using environment variables in the connected workbench.

3.4. Configuring cluster storage

3.4.1. Adding cluster storage to your data science project

For data science projects that require data to be retained, you can add cluster storage to the project. Additionally, you can also connect cluster storage to a specific project’s workbench.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add cluster storage to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to add the cluster storage to.

    A project details page opens.

  3. In the Cluster storage section of the page, click Add cluster storage.

    The Add storage dialog opens.

  4. Enter a name for the cluster storage.
  5. Enter a description for the cluster storage.
  6. Under Persistent storage size, enter a new size in gibibytes. The minimum size is 1 GiB, and the maximum size is 16384 GiB.
  7. Optional: Select a workbench from the list to connect the cluster storage to an existing workbench.
  8. If you selected a workbench to connect the storage to, enter the storage directory in the Mount folder field.
  9. Click Add storage.

Verification

  • The cluster storage that you added appears in the Cluster storage section on the Details page for the project.
  • A new persistent volume claim (PVC) is created with the storage size that you defined.
  • The persistent volume claim (PVC) is visible as an attached storage in the Workbenches section on the Details page for the project.

3.4.2. Updating cluster storage

If your data science work requires you to change the identifying information of a project’s cluster storage or the workbench that the storage is connected to, you can update your project’s cluster storage to change these properties.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project that contains cluster storage.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project whose storage you want to update.

    The Details page for the project opens.

  3. Click the action menu () beside the storage that you want to update in the Cluster storage section and click Edit storage.

    The Edit storage page opens.

  4. Update the storage’s properties.

    1. Update the name for the storage, if applicable.
    2. Update the description for the storage, if applicable.
    3. Increase the Persistent storage size for the storage, if applicable.

      Note that you can only increase the storage size. Updating the storage size restarts the workbench and makes it unavailable for a period of time that is usually proportional to the size change.

    4. Update the workbench that the storage is connected to, if applicable.
    5. If you selected a new workbench to connect the storage to, enter the storage directory in the Mount folder field.
  5. Click Update storage.

If you increased the storage size, the workbench restarts and is unavailable for a period of time that is usually proportional to the size change.

Verification

  • The storage that you updated appears in the Cluster storage section on the Details page for the project.

3.4.3. Deleting cluster storage from a data science project

You can delete cluster storage from your data science projects to help you free up resources and delete unwanted storage space.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project with cluster storage.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to delete the storage from.

    A project details page opens.

  3. In the Cluster storage section, click the action menu () beside the storage that you want to delete and then click Delete storage.

    The Delete storage dialog opens.

  4. Enter the name of the storage in the text field to confirm that you intend to delete it.
  5. Click Delete storage.

Verification

  • The storage that you deleted is no longer displayed in the Cluster storage section on the project Details page.
  • The persistent volume (PV) and persistent volume claim (PVC) associated with the cluster storage are both permanently deleted. This data is not recoverable.

3.5. Configuring data science pipelines

3.5.1. Configuring a pipeline server

Before you can successfully create a pipeline in OpenShift Data Science, you must configure a pipeline server. This includes configuring where your pipeline artifacts and data are stored.

Prerequisites

  • You have installed the OpenShift Pipelines operator.
  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add a pipeline server to.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to configure a pipeline server for.

    A project details page opens.

  3. In the Pipelines section, click Create a pipeline server.

    The Configure pipeline server dialog appears.

  4. In the Object storage connection section, to specify the S3-compatible data connection to store your pipeline artifacts, select one of the following sets of actions:

    Note

    After the pipeline server is created, the /metadata and /artifacts folders are automatically created in the default root folder. Therefore, you are not required to specify any storage directories when configuring a data connection for your pipeline server.

    • Select Existing data connection to use a data connection that you previously defined. If you selected this option, from the Name list, select the name of the relevant data connection and skip to step 6.
    • Select Create new data connection to add a new data connection that your pipeline server can access.
  5. If you selected Create new data connection, perform the following steps:

    1. In the Name field, enter a name for the data connection.
    2. In the AWS_ACCESS_KEY_ID field, enter your access key ID for Amazon Web Services.
    3. In the AWS_SECRET_ACCESS_KEY_ID field, enter your secret access key for the account you specified.
    4. Optional: In the AWS_S3_ENDPOINT field, enter the endpoint of your AWS S3 storage.
    5. Optional: In the AWS_DEFAULT_REGION field, enter the default region of your AWS account.
    6. In the AWS_S3_BUCKET field, enter the name of the AWS S3 bucket.

      Important

      If you are creating a new data connection, in addition to the other designated mandatory fields, the AWS_S3_BUCKET field is mandatory. If you specify incorrect data connection settings, you cannot update these settings on the same pipeline server. Therefore, you must delete the pipeline server and configure another one.

  6. In the Database section, click Show advanced database options to specify the database to store your pipeline data and select one of the following sets of actions:

    • Select Use default database stored on your cluster to deploy a MariaDB database in your project.
    • Select Connect to external MySQL database to add a new connection to an external database that your pipeline server can access.

      1. In the Host field, enter the database’s host name.
      2. In the Port field, enter the database’s port.
      3. In the Username field, enter the default user name that is connected to the database.
      4. In the Password field, enter the password for the default user account.
      5. In the Database field, enter the database name.
  7. Click Configure.

Verification

  • The pipeline server that you configured is displayed in the Pipelines section on the project details page.
  • The Import pipeline button is available in the Pipelines section on the project details page.

3.5.2. Defining a pipeline

The Kubeflow Pipelines SDK enables you to define end-to-end machine learning and data pipelines. Use the Kubeflow Pipelines SDK to build your data science pipeline in Python code. After you have built your pipeline, compile it into Tekton-formatted YAML code using kfp-tekton SDK (version 1.5.x only). After defining the pipeline, you can import the YAML file to the OpenShift Data Science dashboard to enable you to configure its execution settings. For more information about installing and using Kubeflow Pipelines SDK for Tetkon, see Kubeflow Pipelines SDK for Tekton.

You can also use the Elyra JupyterLab extension to create and run data science pipelines within JupyterLab. For more information on creating pipelines in JupyterLab, see Working with pipelines in JupyterLab. For more information on the Elyra JupyterLab extension, see Elyra Documentation.

3.5.3. Importing a data science pipeline

To help you begin working with data science pipelines in OpenShift Data Science, you can import a YAML file containing your pipeline’s code to an active pipeline server. This file contains a Kubeflow pipeline compiled with the Tekton compiler. After you have imported the pipeline to a pipeline server, you can execute the pipeline by creating a pipeline run.

Prerequisites

  • You have installed the OpenShift Pipelines operator.
  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have previously created a data science project that is available and contains a configured pipeline server.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Pipelines Pipelines.

    The Pipelines page opens.

  2. From the Project list, select the project that you want to import a pipeline to.
  3. Click Import pipeline.

    The Import pipeline dialog opens.

  4. Enter the details for the pipeline that you are importing.

    1. In the Pipeline name field, enter a name for the pipeline that you are importing.
    2. In the Pipeline description field, enter a description for the pipeline that you are importing.
    3. Click Upload. Alternatively, drag the file from your local machine’s file system and drop it in the designated area in the Import pipeline dialog.

      A file browser opens.

    4. Navigate to the file containing the pipeline code and click Select.
    5. Click Import pipeline.

Verification

  • The pipeline that you imported is displayed on the Pipelines page.

For more information about using pipelines in OpenShift Data Science, see Working with data science pipelines.

3.6. Configuring model servers

3.6.1. Adding a model server to your data science project

Before you can successfully deploy a data science model on OpenShift Data Science, you must configure a model server. This includes configuring the number of replicas being deployed, the server size, the token authorization, and how the project is accessed.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project that you can add a model server to.
  • If you want to use a custom model-serving runtime for your model server, you have added and enabled the runtime. See Adding a custom model-serving runtime.
  • If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift Data Science. See Enabling GPU support in OpenShift Data Science

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to configure a model server for.

    A project details page opens.

  3. In the Models and model servers section, click Add server.

    The Add model server dialog opens.

  4. In the Model server name field, enter a unique name for the model server.
  5. From the Serving runtime list, select a model-serving runtime that is installed and enabled in your OpenShift Data Science deployment.
  6. In the Number of model replicas to deploy field, specify a value.
  7. From the Model server size list, select one of the following server sizes:

    • Small
    • Medium
    • Large
    • Custom
  8. Optional: If you selected Custom in the preceding step, configure the following settings in the Model server size section to customize your model server:

    1. In the CPUs requested field, specify a number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
    2. In the CPU limit field, specify the maximum number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
    3. In the Memory requested field, specify the requested memory for the model server in gibibytes (Gi).
    4. In the Memory limit field, specify the maximum memory limit for the model server in gibibytes (Gi).
  9. Optional: In the Model server GPUs field, specify a number of GPUs to use with your model server.

    Important

    OpenShift Data Science includes two versions of the OpenVINO Model Server (OVMS) runtime by default; a version that supports GPUs and one that does not. To use GPUs, from the Serving runtime list, you must select the version whose display name includes Supports GPUs.

    If you are using a custom model-serving runtime with your model server, you must ensure that your custom runtime supports GPUs and is appropriately configured to use them.

  10. Optional: In the Model route section, select the Make deployed models available through an external route check box to make your deployed models available to external clients.
  11. Optional: In the Token authorization section, select the Require token authentication check box to require token authentication for your model server. To finish configuring token authentication, perform the following actions:

    1. In the Service account name field, enter a service account name for which the token will be generated. The generated token is created and displayed in the Token secret field when the model server is configured.
    2. To add an additional service account, click Add a service account and enter another service account name.
  12. Click Add.

Verification

  • The model server that you configured is displayed in the Models and model servers section of the project details page.

3.6.2. Adding a custom model-serving runtime

A model-serving runtime provides integration with a specified model server and the model frameworks that it supports. By default, Red Hat OpenShift Data Science includes the OpenVINO Model Server runtime. However, if this runtime doesn’t meet your needs (it doesn’t support a particular model framework, for example), you might want to add your own, custom runtimes.

As an administrator, you can use the OpenShift Data Science interface to add and enable custom model-serving runtimes. You can then choose from your enabled runtimes when you create a new model server.

Prerequisites

  • You have logged in to OpenShift Data Science as an administrator.
  • You are familiar with how to add a model server to your project. When you have added a custom model-serving runtime, you must configure a new model server to use the runtime.
  • You have reviewed the example runtimes in the kserve/modelmesh-serving repository. You can use these examples as starting points. However, each runtime requires some further modification before you can deploy it in OpenShift Data Science. The required modifications are described in the following procedure.

    Note

    OpenShift Data Science includes the OpenVINO Model Server model-serving runtime by default. You do not need to add this runtime to OpenShift Data Science.

Procedure

  1. From the OpenShift Data Science dashboard, click Settings > Serving runtimes.

    The Serving runtimes page opens and shows the model-serving runtimes that are already installed and enabled in your OpenShift Data Science deployment. By default, the OpenVINO Model Server runtime is pre-installed and enabled in OpenShift Data Science.

  2. To add a new, custom runtime, click Add serving runtime.

    The Add serving runtime page opens.

  3. To start adding a new runtime, perform one of the following sets of actions:

    • To upload a YAML file

      1. Click Upload files.

        A file browser opens.

      2. In the file browser, select a YAML file on your computer. This file might be the one of the example runtimes that you downloaded from the kserve/modelmesh-serving repository.

        The embedded YAML editor opens and shows the contents of the file that you uploaded.

    • To enter YAML code directly in the editor

      1. Click Start from scratch.

        The embedded YAML editor opens with no content.

      2. Enter or paste YAML code directly in the embedded editor. The YAML that you paste might be copied from one of the example runtimes in the kserve/modelmesh-serving repository.
  4. Optional: If you are adding one of the example runtimes in the kserve/modelmesh-serving repository, perform the following modifications:

    1. In the YAML editor, locate the kind field for your runtime. Update the value of this field to ServingRuntime.
    2. In the YAML editor, locate the containers.image field for your runtime. Based on the runtime that you are adding, update the field to one of the following:

      Nvidia Triton Inference Server
      image: nvcr.io/nvidia/tritonserver:21.06.1-py3
      Seldon Python MLServer
      image: seldonio/mlserver:0.5.2
      TorchServe
      image: pytorch/torchserve:0.6.0-cpu
  5. In the metadata.name field, ensure that the value of the runtime you are adding is unique (that is, the value isn’t the same as for a runtime you have already added).
  6. Optional: To configure a custom display name for the runtime that you are adding, add a metadata.annotations.openshift.io/display-name field and specify a value, as shown in the following example:

    apiVersion: serving.kserve.io/v1alpha1
    kind: ServingRuntime
    metadata:
      name: mlserver-0.x
      annotations:
        openshift.io/display-name: MLServer
    Note

    If you do not configure a custom display name for your runtime, OpenShift Data Science shows the value of the metadata.name field.

  7. Click Add.

    The Serving runtimes page opens and shows the updated list of runtimes that are installed. Observe that the runtime you added is automatically enabled.

  8. Optional: To edit your custom runtime, click the action menu (⋮) and select Edit.

    Note

    You cannot directly edit the OpenVINO Model Server runtime that is included in OpenShift Data Science by default. However, you can clone this runtime and edit the cloned version. You can then add the edited clone as a new, custom runtime. To do this, click the action menu beside the OpenVINO Model Server and select Clone.

Verification

  • The model-serving runtime you added is shown in an enabled state on the Serving runtimes page.

Additional resources

3.6.3. Updating a model server

You can update your data science project’s model server by changing details, such as the number of deployed replicas, the server size, the token authorization, and how the project is accessed.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project that has a model server assigned.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project whose model server details you want to update.

    A project details page opens.

  3. In the Models and model servers section, locate the model server you want to update. Click the action menu () and select Edit model server.

    The Configure model server dialog opens.

  4. Update the model server properties, as follows:

    Note

    You cannot change the Serving runtime selection for a model server that is already configured. This protects against changing to a runtime that does not support already-deployed models.

    1. In the Model server name field, enter a new, unique name for the model server.
    2. In the Number of model replicas to deploy field, specify a value.
    3. From the Model server size list, select one of the following server sizes:

      • Small
      • Medium
      • Large
      • Custom
    4. Optional: If you selected Custom in the preceding step, configure the following settings in the Model server size section to customize your model server:

      1. In the CPUs requested field, specify a number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
      2. In the CPU limit field, specify the maximum number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
      3. In the Memory requested field, specify the requested memory for the model server in gibibytes (Gi).
      4. In the Memory limit field, specify the maximum memory limit for the model server in gibibytes (Gi).
    5. Optional: In the Model server GPUs field, specify a number of GPUs to use with your model server.

      Important

      OpenShift Data Science includes two versions of the OpenVINO Model Server (OVMS) runtime by default; a version that supports GPUs and one that does not. To use GPUs, from the Serving runtime list, you must select the version whose display name includes Supports GPUs.

      If you are using a custom model-serving runtime with your model server, you must ensure that your custom runtime supports GPUs and is appropriately configured to use them.

    6. Optional: In the Model route section, select the Make deployed models available through an external route check box to make your deployed models available to external clients.
    7. Optional: In the Token authorization section, select the Require token authentication check box to require token authentication for your model server. To finish configuring token authentication, perform the following actions:

      1. In the Service account name field, enter a service account name for which the token will be generated. The generated token is created and displayed in the Token secret field when the model server is configured.
      2. To add an additional service account, click Add a service account and enter another service account name.
  5. Click Configure.

Verification

  • The model server that you updated is displayed in the Models and model servers section on the project details page.

3.6.4. Deleting a model server

When you no longer need a model server to host models, you can remove it from your data science project.

Note

When you remove a model server, you also remove the models that are hosted on that model server. As a result, the models are no longer available to applications.

Prerequisites

  • You have created a data science project and an associated model server.
  • You have notified the users of the applications that access the models that the models will no longer be available.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project from which you want to delete the model server.

    A project details page opens.

  3. Click the action menu () beside the project whose model server you want to delete in the Models and model servers section and then click Delete model server.

    The Delete model server dialog opens.

  4. Enter the name of the model server in the text field to confirm that you intend to delete it.
  5. Click Delete model server.

Verification

  • The model server that you deleted is no longer displayed in the Models and model servers section on the project details page.

For more information about model serving, see Model serving on OpenShift Data Science.

3.7. Configuring access to data science projects

3.7.1. Configuring access to data science projects

To enable you to work collaboratively on your data science projects with other users, you can share access to your project. After creating your project, you can then set the appropriate access permissions from the OpenShift Data Science user interface.

You can assign the following access permission levels to your data science projects:

  • Admin - Users can modify all areas of a project, including its details (project name and description), components, and access permissions.
  • Edit - Users can modify a project’s components, such as its workbench, but they cannot edit a project’s access permissions or its details (project name and description).

3.7.2. Sharing access to a data science project

To enable your organization to work collaboratively, you can share access to your data science project with other users and groups.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. From the list of data science projects, click the name of the data science project that you want to share access to.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Provide one or more users with access to the project.

    1. In the Users section, click Add user.
    2. In the Name field, enter the user name of the user whom you want to provide access to the project.
    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Users with this access level can edit project details and manage access to the project.
      • Edit: Users with this access level can view and edit project components, such as its workbenches, data connections, and storage.
    4. To confirm your entry, click Confirm ( The Confirm icon ).
    5. Optional: To add an additional user, click Add user and repeat the process.
  5. Provide one or more OpenShift groups with access to the project.

    1. In the Groups section, click Add group.
    2. From the Name list, select a group to provide access to the project.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, an input field is displayed enabling you to configure group permissions.

    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.
      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, data connections, and storage.
    4. To confirm your entry, click Confirm ( The Confirm icon ).
    5. Optional: To add an additional group, click Add group and repeat the process.

Verification

  • Users to whom you provided access to the project can perform only the actions permitted by their access permission level.
  • The Users and Groups sections on the Permissions tab show the respective users and groups that you provided with access to the project.

3.7.3. Updating access to a data science project

To change the level of collaboration on your data science project, you can update the access permissions of users and groups who have access to your project.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admins) in OpenShift.
  • You have created a data science project.
  • You have previously shared access to your project with other users or groups.
  • You have administrator permissions or you are the project owner.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Update the user access permissions to the project.

    1. In the Name field, update the user name of the user whom you want to provide access to the project.
    2. From the Permissions list, update the user access permissions by selecting one of the following:

      • Admin: Users with this access level can edit project details and manage access to the project.
      • Edit: Users with this access level can view and edit project components, such as its workbenches, data connections, and storage.
    3. To confirm the update to the entry, click Confirm ( The Confirm icon ).
  5. Update the OpenShift groups access permissions to the project.

    1. From the Name list, update the group that has access to the project by selecting another group from the list.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, an input field is displayed enabling you to configure group permissions.

    2. From the Permissions list, update the group access permissions by selecting one of the following:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.
      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, data connections, and storage.
    3. To confirm the update to the entry, click Confirm ( The Confirm icon ).

Verification

  • The Users and Groups sections on the Permissions tab show the respective users and groups whose project access permissions you changed.

3.7.4. Removing access to a data science project

If you no longer want to work collaboratively on your data science project, you can restrict access to your project by removing users and groups that you previously provided access to your project.

Prerequisites

  • You have logged in to Red Hat OpenShift Data Science.
  • If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example, rhods-users or rhods-admin ) in OpenShift.
  • You have created a data science project.
  • You have previously shared access to your project with other users or groups.
  • You have administrator permissions or you are the project owner.

Procedure

  1. From the OpenShift Data Science dashboard, click Data Science Projects.

    The Data science projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Click the action menu () beside the user or group whose access permissions you want to revoke and click Delete.

Verification

  • Users whose access you have revoked can no longer perform the actions that were permitted by their access permission level.

3.8. Viewing Python packages installed on your notebook server

You can check which Python packages are installed on your notebook server and which version of the package you have by running the pip tool in a notebook cell.

Prerequisites

  • Log in to Jupyter and open a notebook.

Procedure

  1. Enter the following in a new cell in your notebook:

    !pip list
  2. Run the cell.

Verification

  • The output shows an alphabetical list of all installed Python packages and their versions. For example, if you use this command immediately after creating a notebook server that uses the Minimal image, the first packages shown are similar to the following:

    Package                           Version
    --------------------------------- ----------
    aiohttp                           3.7.3
    alembic                           1.5.2
    appdirs                           1.4.4
    argo-workflows                    3.6.1
    argon2-cffi                       20.1.0
    async-generator                   1.10
    async-timeout                     3.0.1
    attrdict                          2.0.1
    attrs                             20.3.0
    backcall                          0.2.0

3.9. Installing Python packages on your notebook server

You can install Python packages that are not part of the default notebook server image by adding the package and the version to a requirements.txt file and then running the pip install command in a notebook cell.

Note

You can also install packages directly, but Red Hat recommends using a requirements.txt file so that the packages stated in the file can be easily re-used across different notebooks. In addition, using a requirements.txt file is also useful when using a S2I build to deploy a model.

Prerequisites

  • Log in to Jupyter and open a notebook.

Procedure

  1. Create a new text file using one of the following methods:

    • Click + to open a new launcher and click Text file.
    • Click File New Text File.
  2. Rename the text file to requirements.txt.

    1. Right-click on the name of the file and click Rename Text. The Rename File dialog opens.
    2. Enter requirements.txt in the New Name field and click Rename.
  3. Add the packages to install to the requirements.txt file.

    altair

    You can specify the exact version to install by using the == (equal to) operator, for example:

    altair==4.1.0
    Note

    Red Hat recommends specifying exact package versions to enhance the stability of your notebook server over time. New package versions can introduce undesirable or unexpected changes in your environment’s behavior.

    To install multiple packages at the same time, place each package on a separate line.

  4. Install the packages in requirements.txt to your server using a notebook cell.

    1. Create a new cell in your notebook and enter the following command:

      !pip install -r requirements.txt
    2. Run the cell by pressing Shift and Enter.
    Important

    This command installs the package on your notebook server, but you must still run the import directive in a code cell to use the package in your code.

    import altair

Verification

3.10. Updating notebook server settings by restarting your server

You can update the settings on your notebook server by stopping and relaunching the notebook server. For example, if your server runs out of memory, you can restart the server to make the container size larger.

Prerequisites

  • A running notebook server.
  • Log in to Jupyter.

Procedure

  1. Click File Hub Control Panel.

    The Notebook server control panel opens.

  2. Click the Stop notebook server button.

    The Stop server dialog opens.

  3. Click Stop server to confirm your decision.

    The Start a notebook server page opens.

  4. Update the relevant notebook server settings and click Start server.

Verification

  • The notebook server starts and contains your updated settings.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.