Chapter 5. Model serving on OpenShift Data Science
As a data scientist, you can deploy your trained machine-learning models to serve intelligent applications in production. After you have deployed your model, applications can send requests to the model using its deployed API endpoint.
5.1. Configuring model servers
5.1.1. Adding a model server to your data science project
Before you can successfully deploy a data science model on OpenShift Data Science, you must configure a model server. This includes configuring the number of replicas being deployed, the server size, the token authorization, and how the project is accessed.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
orrhods-admin
) in OpenShift. - You have created a data science project that you can add a model server to.
- If you want to use a custom model-serving runtime for your model server, you have added and enabled the runtime. See Adding a custom model-serving runtime.
- If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift Data Science. See Enabling GPU support in OpenShift Data Science
Procedure
From the OpenShift Data Science dashboard, click Data Science Projects.
The Data science projects page opens.
Click the name of the project that you want to configure a model server for.
A project details page opens.
In the Models and model servers section, click Add server.
The Add model server dialog opens.
- In the Model server name field, enter a unique name for the model server.
- From the Serving runtime list, select a model-serving runtime that is installed and enabled in your OpenShift Data Science deployment.
- In the Number of model replicas to deploy field, specify a value.
From the Model server size list, select one of the following server sizes:
- Small
- Medium
- Large
- Custom
Optional: If you selected Custom in the preceding step, configure the following settings in the Model server size section to customize your model server:
- In the CPUs requested field, specify a number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
- In the CPU limit field, specify the maximum number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
- In the Memory requested field, specify the requested memory for the model server in gibibytes (Gi).
- In the Memory limit field, specify the maximum memory limit for the model server in gibibytes (Gi).
Optional: In the Model server GPUs field, specify a number of GPUs to use with your model server.
ImportantOpenShift Data Science includes two versions of the OpenVINO Model Server (OVMS) runtime by default; a version that supports GPUs and one that does not. To use GPUs, from the Serving runtime list, you must select the version whose display name includes
Supports GPUs
.If you are using a custom model-serving runtime with your model server, you must ensure that your custom runtime supports GPUs and is appropriately configured to use them.
- Optional: In the Model route section, select the Make deployed models available through an external route check box to make your deployed models available to external clients.
Optional: In the Token authorization section, select the Require token authentication check box to require token authentication for your model server. To finish configuring token authentication, perform the following actions:
- In the Service account name field, enter a service account name for which the token will be generated. The generated token is created and displayed in the Token secret field when the model server is configured.
- To add an additional service account, click Add a service account and enter another service account name.
- Click Add.
Verification
- The model server that you configured is displayed in the Models and model servers section of the project details page.
5.1.2. Adding a custom model-serving runtime
A model-serving runtime provides integration with a specified model server and the model frameworks that it supports. By default, Red Hat OpenShift Data Science includes the OpenVINO Model Server runtime. However, if this runtime doesn’t meet your needs (it doesn’t support a particular model framework, for example), you might want to add your own, custom runtimes.
As an administrator, you can use the OpenShift Data Science interface to add and enable custom model-serving runtimes. You can then choose from your enabled runtimes when you create a new model server.
Prerequisites
- You have logged in to OpenShift Data Science as an administrator.
- You are familiar with how to add a model server to your project. When you have added a custom model-serving runtime, you must configure a new model server to use the runtime.
You have reviewed the example runtimes in the kserve/modelmesh-serving repository. You can use these examples as starting points. However, each runtime requires some further modification before you can deploy it in OpenShift Data Science. The required modifications are described in the following procedure.
NoteOpenShift Data Science includes the OpenVINO Model Server model-serving runtime by default. You do not need to add this runtime to OpenShift Data Science.
Procedure
From the OpenShift Data Science dashboard, click Settings > Serving runtimes.
The Serving runtimes page opens and shows the model-serving runtimes that are already installed and enabled in your OpenShift Data Science deployment. By default, the OpenVINO Model Server runtime is pre-installed and enabled in OpenShift Data Science.
To add a new, custom runtime, click Add serving runtime.
The Add serving runtime page opens.
To start adding a new runtime, perform one of the following sets of actions:
To upload a YAML file
Click Upload files.
A file browser opens.
In the file browser, select a YAML file on your computer. This file might be the one of the example runtimes that you downloaded from the kserve/modelmesh-serving repository.
The embedded YAML editor opens and shows the contents of the file that you uploaded.
To enter YAML code directly in the editor
Click Start from scratch.
The embedded YAML editor opens with no content.
- Enter or paste YAML code directly in the embedded editor. The YAML that you paste might be copied from one of the example runtimes in the kserve/modelmesh-serving repository.
Optional: If you are adding one of the example runtimes in the kserve/modelmesh-serving repository, perform the following modifications:
-
In the YAML editor, locate the
kind
field for your runtime. Update the value of this field toServingRuntime
. In the YAML editor, locate the
containers.image
field for your runtime. Based on the runtime that you are adding, update the field to one of the following:- Nvidia Triton Inference Server
-
image: nvcr.io/nvidia/tritonserver:21.06.1-py3
- Seldon Python MLServer
-
image: seldonio/mlserver:0.5.2
- TorchServe
-
image: pytorch/torchserve:0.6.0-cpu
-
In the YAML editor, locate the
-
In the
metadata.name
field, ensure that the value of the runtime you are adding is unique (that is, the value isn’t the same as for a runtime you have already added). Optional: To configure a custom display name for the runtime that you are adding, add a
metadata.annotations.openshift.io/display-name
field and specify a value, as shown in the following example:apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: name: mlserver-0.x annotations: openshift.io/display-name: MLServer
NoteIf you do not configure a custom display name for your runtime, OpenShift Data Science shows the value of the
metadata.name
field.Click Add.
The Serving runtimes page opens and shows the updated list of runtimes that are installed. Observe that the runtime you added is automatically enabled.
Optional: To edit your custom runtime, click the action menu (⋮) and select Edit.
NoteYou cannot directly edit the OpenVINO Model Server runtime that is included in OpenShift Data Science by default. However, you can clone this runtime and edit the cloned version. You can then add the edited clone as a new, custom runtime. To do this, click the action menu beside the OpenVINO Model Server and select Clone.
Verification
- The model-serving runtime you added is shown in an enabled state on the Serving runtimes page.
Additional resources
- To learn how to configure a model server that uses a custom model-serving runtime that you have added, see Adding a model server to your data science project.
5.1.3. Updating a model server
You can update your data science project’s model server by changing details, such as the number of deployed replicas, the server size, the token authorization, and how the project is accessed.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
orrhods-admins
) in OpenShift. - You have created a data science project that has a model server assigned.
Procedure
From the OpenShift Data Science dashboard, click Data Science Projects.
The Data science projects page opens.
Click the name of the project whose model server details you want to update.
A project details page opens.
In the Models and model servers section, locate the model server you want to update. Click the action menu (⋮) and select Edit model server.
The Configure model server dialog opens.
Update the model server properties, as follows:
NoteYou cannot change the Serving runtime selection for a model server that is already configured. This protects against changing to a runtime that does not support already-deployed models.
- In the Model server name field, enter a new, unique name for the model server.
- In the Number of model replicas to deploy field, specify a value.
From the Model server size list, select one of the following server sizes:
- Small
- Medium
- Large
- Custom
Optional: If you selected Custom in the preceding step, configure the following settings in the Model server size section to customize your model server:
- In the CPUs requested field, specify a number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
- In the CPU limit field, specify the maximum number of CPUs to use with your model server. Use the list beside this field to specify the value in cores or millicores.
- In the Memory requested field, specify the requested memory for the model server in gibibytes (Gi).
- In the Memory limit field, specify the maximum memory limit for the model server in gibibytes (Gi).
Optional: In the Model server GPUs field, specify a number of GPUs to use with your model server.
ImportantOpenShift Data Science includes two versions of the OpenVINO Model Server (OVMS) runtime by default; a version that supports GPUs and one that does not. To use GPUs, from the Serving runtime list, you must select the version whose display name includes
Supports GPUs
.If you are using a custom model-serving runtime with your model server, you must ensure that your custom runtime supports GPUs and is appropriately configured to use them.
- Optional: In the Model route section, select the Make deployed models available through an external route check box to make your deployed models available to external clients.
Optional: In the Token authorization section, select the Require token authentication check box to require token authentication for your model server. To finish configuring token authentication, perform the following actions:
- In the Service account name field, enter a service account name for which the token will be generated. The generated token is created and displayed in the Token secret field when the model server is configured.
- To add an additional service account, click Add a service account and enter another service account name.
- Click Configure.
Verification
- The model server that you updated is displayed in the Models and model servers section on the project details page.
5.1.4. Deleting a model server
When you no longer need a model server to host models, you can remove it from your data science project.
When you remove a model server, you also remove the models that are hosted on that model server. As a result, the models are no longer available to applications.
Prerequisites
- You have created a data science project and an associated model server.
- You have notified the users of the applications that access the models that the models will no longer be available.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
orrhods-admin
) in OpenShift.
Procedure
From the OpenShift Data Science dashboard, click Data Science Projects.
The Data science projects page opens.
Click the name of the project from which you want to delete the model server.
A project details page opens.
Click the action menu (⋮) beside the project whose model server you want to delete in the Models and model servers section and then click Delete model server.
The Delete model server dialog opens.
- Enter the name of the model server in the text field to confirm that you intend to delete it.
- Click Delete model server.
Verification
- The model server that you deleted is no longer displayed in the Models and model servers section on the project details page.
5.2. Working with deployed models
5.2.1. Deploying a model in OpenShift Data Science
You can deploy trained models on OpenShift Data Science to enable you to test and implement them into intelligent applications. Deploying a model makes it available as a service that you can access using an API. This enables you to return predictions based on data inputs.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. - You have created a data science project that contains an associated model server.
- You know the folder path for the data connection that you want the model to access.
Procedure
From the {product-short} dashboard, click Data Science Projects.
The Data science projects page opens.
Click the name of the project containing the model that you want to deploy.
A project details opens.
In the Models and model servers section, next to the name of your model server, click Deploy model.
The Deploy model dialog opens.
Configure properties for deploying your model as follows:
- In the Model Name field, enter a unique name for the model that you are deploying.
From the Model framework list, select a framework for your model.
NoteThe Model framework list shows only the frameworks that are supported by the model-serving runtime that you specified when you configured your model server.
To specify the location of your model, perform one of the following sets of actions:
To use an existing data connection
- Select Existing data connection.
- From the Name list, select a data connection that you previously defined.
- In the Folder path field, enter the folder path that contains the model in your specified data source.
To use a new data connection
- To define a new data connection that your model can access, select New data connection.
- In the Name field, enter a unique name for the data connection.
- In the AWS_ACCESS_KEY_ID field, enter your access key ID for Amazon Web Services (AWS).
- In the AWS_SECRET_ACCESS_KEY field, enter your secret access key for the AWS account you specified.
- In the AWS_S3_ENDPOINT field, enter the endpoint of your AWS S3 storage.
- In the AWS_DEFAULT_REGION field, enter the default region of your AWS account.
- In the AWS_S3_BUCKET field, enter the name of the AWS S3 bucket.
- In the Folder path field, enter the folder path in your AWS S3 bucket that contains your data file.
- Click Deploy.
Verification
- The model you deployed is displayed on the Model Serving page of the dashboard.
5.2.2. Viewing a deployed model
To analyse the results of your work, you can view a list of deployed models on Red Hat OpenShift Data Science. You can also view the current statuses of deployed models and their endpoints.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. - There are active and deployed data science models available on the Model Serving page.
Procedure
From the OpenShift Data Science dashboard, click Model Serving.
The Model Serving page opens.
Review the list of deployed models.
Inference endpoints are displayed in the Inference endpoint column in the Deployed models table.
- Optional: Click the Copy button ( ) on the relevant row to copy the model’s inference endpoint to the clipboard.
Verification
- A list of previously deployed data science models is displayed on the Model Serving page.
5.2.3. Updating the deployment properties of a deployed model
You can update the deployment properties of a model that has been deployed previously. This allows you to change the model’s data connection and name.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. - You have deployed a model on OpenShift Data Science.
Procedure
From the OpenShift Data Science dashboard, click Model serving.
The Model Serving page opens.
Click the action menu (⋮) beside the model whose deployment properties you want to update and click Edit.
The Deploy model dialog opens.
Update the deployment properties of the model as follows:
- In the Model Name field, enter a new, unique name for the model.
From the Model framework list, select a framework for your model.
NoteThe Model framework list shows only the frameworks that are supported by the model-serving runtime that you specified when you configured your model server.
To update how you have specified the location of your model, perform one of the following sets of actions:
If you previously specified an existing data connection
- In the Folder path field, update the folder path that contains the model in your specified data source.
If you previously specified a new data connection
- In the Name field, update the name of the data connection.
- In the AWS_ACCESS_KEY_ID field, update your access key ID for Amazon Web Services (AWS).
- In the AWS_SECRET_ACCESS_KEY field, update your secret access key for the AWS account you specified.
- In the AWS_S3_ENDPOINT field, update the endpoint of your AWS S3 storage.
- In the AWS_DEFAULT_REGION field, update the default region of your AWS account.
- In the AWS_S3_BUCKET field, update the name of the AWS S3 bucket.
- In the Folder path field, update the folder path in your AWS S3 bucket that contains your data file.
- Click Configure.
Verification
- The model whose deployment properties you updated is displayed on the Model Serving page.
5.2.4. Deleting a deployed model
You can delete models you have previously deployed. This enables you to remove deployed models that are no longer required.
Prerequisites
- You have logged in to Red Hat OpenShift Data Science.
-
If you are using specialized OpenShift Data Science groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. - You have deployed a model.
Procedure
From the OpenShift Data Science dashboard, click Model serving.
The Model Serving page opens.
Click the action menu (⋮) beside the deployed model that you want to delete and click Delete.
The Delete deployed model dialog opens.
- Enter the name of the deployed model in the text field to confirm that you intend to delete it.
- Click Delete deployed model.
Verification
- The model that you deleted is no longer displayed on the Model Serving page.