Chapter 1. Using Red Hat OpenShift AI with MicroShift

1.1. How Red Hat OpenShift AI works in MicroShift
Copia collegamento

Edge deployments are where data happens and decisions need to be made. You can use Red Hat OpenShift AI (Red Hat OpenShift AI Self-Managed) to integrate a fleet of MicroShift-driven edge devices into the artificial intelligence and machine learning (AI/ML) operations cycle. MicroShift is compatible with a single-model serving platform based on the KServe component of Kubernetes. KServe is a platform that orchestrates model serving.

Red Hat OpenShift AI Self-Managed is a platform for data scientists and developers of AI/ML applications. First, use Red Hat OpenShift AI Self-Managed in the cloud or data center to develop, train, and test an AI model. Then, run your model in your edge deployments on MicroShift.

After you deploy your AI model, application data can be sent to the model where the model can make data-driven decisions without a human user. This is an ideal scenario for edge applications where interaction with an administrator is naturally limited.

Implemented with KServe

The KServe component includes model-serving runtimes that implement the loading of various types of model servers. These runtimes are configured with custom resources (CRs). KServe custom resource definitions (CRDs) also define the lifecycle of the deployment object, storage access, and networking setup.

Specifics of using Red Hat OpenShift AI Self-Managed with MicroShift

As an edge-optimized Kubernetes deployment, MicroShift has the following limitations when using Red Hat OpenShift AI Self-Managed:

AI model serving on MicroShift is only available on the x86_64 architecture.
A subset of Red Hat OpenShift AI Self-Managed Operator components are supported on MicroShift.
As a single-node Kubernetes distribution, MicroShift does not support multi-model deployments. You must use the single-model serving platform.
You must develop the AI models you want to run on the MicroShift model-serving platform in the cloud or your data center. Using MicroShift as a development platform for AI models is not supported.
You must plan for any additional RAM, disk space, and storage configurations required to serve your AI model.
Not all model servers support the IPv6 networking protocol. Check each model server’s documentation to verify that your networking configuration is supported.
You must secure the exposed model server endpoint, for example, with OAUTH2.
ClusterServingRuntimes CRDs are not supported by Red Hat OpenShift AI Self-Managed, which means that you must copy the ServingRuntime CR shipped within the microshift-ai-model-serving RPM to your workload namespace.
To administer model serving on MicroShift, you must use the CLI. The Red Hat OpenShift AI Self-Managed dashboard is not supported.

1.2. Workflow for using Red Hat OpenShift AI Self-Managed with MicroShift
Copia collegamento

Using Red Hat OpenShift AI Self-Managed with MicroShift requires the following general workflow:

Getting your AI model ready

Choose the artificial intelligence (AI) model that best aligns with your edge application and the decisions that need to be made at MicroShift deployment sites.
In the cloud or data center, develop, train, and test your model.
Plan for the system requirements and additional resources your AI model requires to run.

Setting up the deployment environment

Configure your Red Hat Device Edge for the specific hardware your deployment runs on, including driver and device plugins.
To enable GPU or other hardware accelerators for MicroShift, follow the guidance specific for your edge device about what you need to install. For example, to use an NVIDIA GPU accelerator, begin by reading the following NVIDIA documentation: Running a GPU-Accelerated Workload on Red Hat Device Edge (NVIDIA documentation).
For troubleshooting, consult the device documentation or product support.
Tip
Using only a driver and device plugin instead of an Operator might be more resource efficient.

Installing the MicroShift Red Hat OpenShift AI Self-Managed RPM

Install the microshift-ai-model-serving RPM package.
Restart MicroShift if you are adding the RPM while MicroShift is running.

Getting ready to deploy

Package your AI model into an OCI image, otherwise known as the ModelCar format. If you already have S3-compatible storage or a persistent volume claim set up, you can skip this step, but only the ModelCar format is tested and supported for MicroShift.
Select a model-serving runtime, which acts as your model server. Configure the runtime with the serving runtime and inference service.
- Copy the ServingRuntime custom resource (CR) from the default redhat-ods-applications namespace to your own namespace.
- Create the InferenceService CR.
Optional: Create a Route object so that your model can connect outside the node.

Using your model

Make requests against the model server. For example, another pod running in your MicroShift deployment that is attached to a camera can stream an image back to the model-serving runtime. The model-serving runtime prepares that image as data for model inferencing. If the model was trained in the binary identification of a bee, the AI model outputs the likelihood that the image data is a bee.

1.3. Installing the Red Hat OpenShift AI RPM
Copia collegamento

To use AI models in MicroShift deployments, use this procedure to install the Red Hat OpenShift AI (Red Hat OpenShift AI Self-Managed) RPM with a new MicroShift installation. You can also install the RPM on an existing MicroShift instance if you restart the system.

Important

Red Hat OpenShift AI is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Prerequisites

The system requirements for installing MicroShift have been met.
You have root user access to your machine.
The OpenShift CLI (oc) is installed.
You configured your LVM VG with the capacity needed for the PVs of your workload.
You have the RAM and disk space required for your AI model.
You configured the required accelerators, hardware, operating system, and MicroShift to provide the resources your model needs.
Your AI model is ready to use.

Note

The microshift-ai-model-serving RPM contains manifests that deploy kserve, with the raw deployment mode enabled, and ServingRuntimes objects in the redhat-ods-applications namespace.

Procedure

Install the MicroShift AI-model-serving RPM package by running the following command:
```
sudo dnf install microshift-ai-model-serving
```
```
$ sudo dnf install microshift-ai-model-serving
```
Copy to Clipboard Toggle word wrap
As a root user, restart the MicroShift service by entering the following command:
```
sudo systemctl restart microshift
```
```
$ sudo systemctl restart microshift
```
Copy to Clipboard Toggle word wrap
Optional: Install the release information package by running the following command:
```
sudo dnf install microshift-ai-model-serving-release-info
```
```
$ sudo dnf install microshift-ai-model-serving-release-info 
```
1
Copy to Clipboard Toggle word wrap
1
The release information package contains a JSON file with image references useful for offline procedures or deploying copy of a ServingRuntime to your namespace during a bootc image build.

Verification

Verify that the kserve pod is running in the redhat-ods-applications namespace by entering the following command:

oc get pods -n redhat-ods-applications

$ oc get pods -n redhat-ods-applications

Copy to Clipboard

Toggle word wrap

Example output

NAME                                        READY   STATUS    RESTARTS   AGE
kserve-controller-manager-7fc9fc688-kttmm   1/1     Running   0          1h

NAME                                        READY   STATUS    RESTARTS   AGE
kserve-controller-manager-7fc9fc688-kttmm   1/1     Running   0          1h

Copy to Clipboard

Toggle word wrap

Next steps

Create a namespace for your AI model.
Package your model into an OCI image.
Configure a model-serving runtime.
Verify that your model is ready for inferencing.
Make requests against the model server.

1.4. Creating a namespace for your AI model on MicroShift
Copia collegamento

Create a namespace for your AI model and all other resources.

Prerequisites

You have root user access to your machine.
The OpenShift CLI (oc) is installed.

Procedure

Create a new namespace by running the following command:
```
oc create ns <namespace_name>
```
```
$ oc create ns <namespace_name> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace_name> with the namespace name you want to use. In the following examples, ai-demo is used.

Verification

Verify that you created the desired namespace by running the following command:
```
oc get ns <namespace_name>
```
```
$ oc get ns <namespace_name> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace_name> with the namespace name you want to use. In the following examples, ai-demo is used.
Example output
```
NAME                STATUS  AGE
ai-demo   Active  1h
```
```
NAME                STATUS  AGE
ai-demo   Active  1h
```
Copy to Clipboard Toggle word wrap

1.5. Packaging your AI model into an OCI image
Copia collegamento

You can package your model into an OCI image and use the ModelCar approach to help you set up offline environments. With the ModelCar approach, your model can be embedded just like any other container image.

Note

If you already have S3-compatible object storage or a configured persistent volume claim, you can upload your AI model to those resources, but only the ModelCar approach is tested and supported.

Prerequisites

You have root user access to your machine.
The OpenShift CLI (oc) is installed.
Podman is installed.
Your model is ready to use.
You understand the concepts in the "How to build a ModelCar container" section of the following article about building an OCI image suitable for an vLLM model server, Build and deploy a ModelCar container in OpenShift AI.
Note
The exact directory structure depends on the model server. The following example uses a Containerfile with a ResNet-50 model that is compatible with the OpenVINO Model Server OVMS. OVMS generally does not require an additional hardware accelerator.

Procedure

Prepare a Containerfile with a compatible model and model server.

Example Containerfile with a ResNet-50 model used with the OVMS

FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
RUN microdnf install -y wget && microdnf clean all
RUN mkdir -p /models/1 && chmod -R 755 /models/1
RUN wget -q -P /models/1 \
  https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin \
  https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml

FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
RUN microdnf install -y wget && microdnf clean all
RUN mkdir -p /models/1 && chmod -R 755 /models/1
RUN wget -q -P /models/1 \
  https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin \
  https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml

Copy to Clipboard

Toggle word wrap

Set the IMAGE_REF environment variable to simplify your process by running the following command:
```
IMAGE_REF=<ovms-resnet50:test>
```
```
$ IMAGE_REF=<ovms-resnet50:test> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <ovms-resnet50:test> with the name of your image reference. In this example, the <repo:tag> format is used. Your image reference name is specific to your use case.

Build the Containerfile by running the following command:

sudo podman build -t $IMAGE_REF

$ sudo podman build -t $IMAGE_REF

1

Copy to Clipboard

Toggle word wrap

1: Because CRI-O and Podman share storage, using sudo is required to make the image part of the root’s container storage and usable by MicroShift.

Example output

STEP 1/4: FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
Trying to pull registry.access.redhat.com/ubi9/ubi-minimal:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob 533b69cfd644 done   |
Copying blob 863e9a7e2102 done   |
Copying config 098048e6f9 done   |
Writing manifest to image destination
Storing signatures
STEP 2/4: RUN microdnf install -y wget && microdnf clean all
<< SNIP >>
--> 4c74352ad42e
STEP 3/4: RUN mkdir -p /models/1 && chmod -R 755 /models/1
--> bfd31acb1e81
STEP 4/4: RUN wget -q -P /models/1   https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin   https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml
COMMIT ovms-resnet50:test
--> 375b265c1c4b
Successfully tagged localhost/ovms-resnet50:test
375b265c1c4bc6f0a059c8739fb2b3a46e1b563728f6d9c51f26f29bb2c87

STEP 1/4: FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
Trying to pull registry.access.redhat.com/ubi9/ubi-minimal:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob 533b69cfd644 done   |
Copying blob 863e9a7e2102 done   |
Copying config 098048e6f9 done   |
Writing manifest to image destination
Storing signatures
STEP 2/4: RUN microdnf install -y wget && microdnf clean all
<< SNIP >>
--> 4c74352ad42e
STEP 3/4: RUN mkdir -p /models/1 && chmod -R 755 /models/1
--> bfd31acb1e81
STEP 4/4: RUN wget -q -P /models/1   https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin   https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml
COMMIT ovms-resnet50:test
--> 375b265c1c4b
Successfully tagged localhost/ovms-resnet50:test
375b265c1c4bc6f0a059c8739fb2b3a46e1b563728f6d9c51f26f29bb2c87

Copy to Clipboard

Toggle word wrap

Optional: Push the Containerfile to your registry by running the following command:
```
sudo podman push $IMAGE_REF
```
```
$ sudo podman push $IMAGE_REF
```
Copy to Clipboard Toggle word wrap
Important
For offline use cases, include a tag other than latest. If the latest tag is used, the container that fetches and sets up the model is configured with the imagePullPolicy: parameter set to Always and the local image is ignored. If you use any other tag than latest, the imagePullPolicy: parameter is set to IfNotPresent.

Verification

Verify that the image exists by running the following command:

sudo podman images ovms-resnet50

$ sudo podman images ovms-resnet50

Copy to Clipboard

Toggle word wrap

Example output

REPOSITORY                TAG   IMAGE ID        CREATED         SIZE
localhost/ovms-resnet50   test  375b265c1c4b    3 minutes ago   136 MB

REPOSITORY                TAG   IMAGE ID        CREATED         SIZE
localhost/ovms-resnet50   test  375b265c1c4b    3 minutes ago   136 MB

Copy to Clipboard

Toggle word wrap

Next steps

Configure a model-serving runtime.
Confirm that your AI model is ready for inferencing.
Make requests against the model server.

1.6. Serving AI models on MicroShift
Copia collegamento

You can serve models on the Red Hat OpenShift AI Self-Managed single-model serving platform in MicroShift by configuring a model-serving runtime using the ServingRuntime and InferenceService custom resource (CRs).

Model-serving runtimes for AI models in MicroShift: A model-serving runtime is an environment for deploying and managing an AI model, providing integration with a specified model server and the model frameworks that it supports. Creating the model-serving runtime means configuring the objects that select the right model format for the AI model and serve the queries, among other detailed functions specific your deployment.
The ServingRuntime custom resource: The ServingRuntime CR is a YAML file that defines the templates for pods that can dynamically load and unload AI model formats and exposes a service endpoint for querying the model through the API. Each ServingRuntime CR contains the information needed to run AI models, including the container image of the runtime and a list of the model formats that the model-serving runtime supports. Other configuration settings for the model-serving runtime can be set with environment variables defined in the container specification.
The InferenceService custom resource: The InferenceService CR is a YAML file that creates a server or inference service to process inference queries, pass them to the model, then return the inference output. In MicroShift, the output is returned in the CLI. This inference service configuration file can also include many other options, such as specifying a hardware accelerator.

Important

As a single-node Kubernetes distribution, MicroShift does not support multi-model deployments. You must use the single-model serving platform. In each MicroShift deployment, you can use one AI model, but potentially more than one model runtime.

Workflow for configuring a model-serving runtime

Select the model-serving runtime that supports the format of your AI model.
Create the ServingRuntime CR in your workload namespace.
If the MicroShift node is already running, you can export the required ServingRuntime CR to a file and edit it.
If the MicroShift node is not running, or if you want to manually prepare a manifest, you can use the original definition on the disk, which is part of the microshift-ai-model-serving RPM.
Create the InferenceService CR in your workload namespace.

1.6.1. Supported Red Hat OpenShift AI Self-Managed custom resource definitions
Copia collegamento

The following Red Hat OpenShift AI Self-Managed custom resource definitions (CRDs) are supported:

InferenceServices
TrainedModels
ServingRuntimes
InferenceGraphs
ClusterStorageContainers
ClusterLocalModels
LocalModelNodeGroups

1.6.2. Supported Red Hat OpenShift AI Self-Managed model-serving runtimes
Copia collegamento

The following Red Hat OpenShift AI Self-Managed model-serving runtimes are verified for MicroShift deployments:

vLLM ServingRuntime for KServe
OpenVINO Model Server
Important
The OpenVINO Model Server does not support the IPv6 networking protocol. Check each model server before use to ensure that it supports your networking configuration.

The following runtimes are available for development purposes with MicroShift:

Caikit Text Generation Inference Server (Caikit-TGIS) ServingRuntime for KServe
Caikit Standalone ServingRuntime for KServe
Text Generation Inference Server (TGIS) Standalone ServingRuntime for KServe
vLLM ServingRuntime with Gaudi accelerators support for KServe
vLLM ROCm ServingRuntime for KServe
Custom runtimes that you create and test

1.7. Creating a ServingRuntime CR for use in MicroShift
Copia collegamento

Create a ServingRuntime custom resource (CR) based on installed manifests and release information. The included steps are an example of reusing the included microshift-ai-model-serving manifest files to re-create the OpenVINO Model Server (OVMS) model-serving runtime in the workload namespace.

Note

This approach does not require a live node, so it can be part of CI/CD automation.

Prerequisites

Both the microshift-ai-model-serving and microshift-ai-model-serving-release-info RPMs are installed.
You have root user access to your machine.
The OpenShift CLI (oc) is installed.

Procedure

Extract the image reference of the ServingRuntime CR you want to use from the MicroShift release information file by running the following command:

OVMS_IMAGE="$(jq -r '.images | with_entries(select(.key == "ovms-image")) | .[]' /usr/share/microshift/release/release-ai-model-serving-"$(uname -i)".json)"

$ OVMS_IMAGE="$(jq -r '.images | with_entries(select(.key == "ovms-image")) | .[]' /usr/share/microshift/release/release-ai-model-serving-"$(uname -i)".json)"

1

Copy to Clipboard

Toggle word wrap

1: In this example, the image reference for the OVMS model-serving runtime is extracted.

Copy the original ServingRuntime YAML file by running the following command:

cp /usr/lib/microshift/manifests.d/050-microshift-ai-model-serving-runtimes/ovms-kserve.yaml ./ovms-kserve.yaml

$ cp /usr/lib/microshift/manifests.d/050-microshift-ai-model-serving-runtimes/ovms-kserve.yaml ./ovms-kserve.yaml

Copy to Clipboard

Toggle word wrap

Add the actual image reference to the image: parameter field value of the ServingRuntime YAML by running the following command:
```
sed -i "s,image: ovms-image,image: ${OVMS_IMAGE}," ./ovms-kserve.yaml
```
```
$ sed -i "s,image: ovms-image,image: ${OVMS_IMAGE}," ./ovms-kserve.yaml
```
Copy to Clipboard Toggle word wrap
Create the ServingRuntime object in a custom namespace using the YAML file by running the following command:
```
oc create -n <ai_demo> -f ./ovms-kserve.yaml
```
```
$ oc create -n <ai_demo> -f ./ovms-kserve.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <ai_demo> with the name of your namespace.

Important

If the ServingRuntime CR is part of a new manifest, set the namespace in the kustomization.yaml file, for example:

Example Kustomize manifest namespace value

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ai-demo
resources:
  - ovms-kserve.yaml
#...

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ai-demo
resources:
  - ovms-kserve.yaml
#...

Copy to Clipboard

Toggle word wrap

Next steps

Create the InferenceService object.
Verify that your model is ready for inferencing.
Query the model.
Optional: examine the model metrics.

1.8. Creating an InferenceService custom resource
Copia collegamento

Create an InferenceService custom resource (CR) to instruct KServe how to create a deployment for serving your AI model. KServe uses the ServingRuntime based on the modelFormat value specified in the InferenceService CR.

Prerequisites

You configured the ServingRuntimes CR.
You have root user access to your machine.
The OpenShift CLI (oc) is installed.

Procedure

Create the InferenceService CR.

Example InferenceService object with an openvino_ir model format

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: ovms-resnet50
spec:
  predictor:
    model:
      protocolVersion: v2
      modelFormat:
        name: openvino_ir
      storageUri: "oci://localhost/ovms-resnet50:test"
      args:
      - --layout=NHWC:NCHW

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: ovms-resnet50
spec:
  predictor:
    model:
      protocolVersion: v2
      modelFormat:
        name: openvino_ir
      storageUri: "oci://localhost/ovms-resnet50:test"
      args:
      - --layout=NHWC:NCHW

1

Copy to Clipboard

Toggle word wrap

1: An additional argument to make OpenVINO Model Server (OVMS) accept the request input data in a different layout than the model was originally exported with. Extra arguments are passed through to the OVMS container.

Save the InferenceService example to a file, then create it on the cluster by running the following command:
```
oc create -n <ai_demo> -f ./FILE.yaml
```
```
$ oc create -n <ai_demo> -f ./FILE.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <ai_demo> with your namespace name.
Example output
```
inferenceservice.serving.kserve.io/ovms-resnet50 created
```
```
inferenceservice.serving.kserve.io/ovms-resnet50 created
```
Copy to Clipboard Toggle word wrap
Note
A deployment and a pod are expected to appear in the specified namespace. Depending on the size of the image specified in the ServingRuntime CR and the size of the ModelCar OCI image, it might take several minutes for the pod to be ready.

Next step

Verify that the model-serving runtime is ready.

1.8.1. Exporting model-server metrics by using Open Telemetry
Copia collegamento

You can export model-server metrics by using Open Telemetry if you installed the microshift-observability RPM for MicroShift.

Note

You can alternatively get the Prometheus-format metrics of the model server by making a request on the /metrics endpoint. See "Getting the model-server metrics" for more information.

Prerequisites

You configured the ServingRuntimes CR.
You have root user access to your machine.
The OpenShift CLI (oc) is installed.
You installed the microshift-observability RPM.
Your MicroShift Open Telemetry configuration includes the Prometheus Receiver. For more information, see Prometheus Receiver.

Procedure

Add the following Open Telemetry annotation to your InferenceService custom resource:

Example InferenceService object with Open Telemetry

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: ovms-resnet50
#...
  annotations:
    prometheus.io/scrape: "true"
#...

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: ovms-resnet50
#...
  annotations:
    prometheus.io/scrape: "true"
#...

Copy to Clipboard

Toggle word wrap

1.8.2. More InferenceService CR options
Copia collegamento

The inference service YAML file can include many different options. For example, you can include a resources section that is passed first to the deployment and then to the pod, so that the model server gets access to your hardware through the device plugin.

Example NVIDIA device resources snippet in an InferenceService CR

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: is-name
spec:
  predictor:
    model:
      resources:
        limits:
          nvidia.com/gpu: 1
        requests:
          nvidia.com/gpu: 1
#...

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: is-name
spec:
  predictor:
    model:
      resources:
        limits:
          nvidia.com/gpu: 1
        requests:
          nvidia.com/gpu: 1
#...

Copy to Clipboard

Toggle word wrap

For complete InferenceService specifications, see the Control Plane API Reference (KServe documentation).

1.9. Verifying that the model-serving runtime is ready
Copia collegamento

Verify that your model-serving runtime is ready for use by checking that the downstream generation activities are complete.

Prerequisites

You configured the ServingRuntimes CR.
You created the InferenceService CR.
You have root user access to your machine.
The OpenShift CLI (oc) is installed.

Procedure

Check that the AI model is deployed in your custom namespace by running the following command:

oc get -n ai-demo deployment

$ oc get -n ai-demo deployment

Copy to Clipboard

Toggle word wrap

Example output

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
ovms-resnet50-predictor   1/1     1            1           72s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
ovms-resnet50-predictor   1/1     1            1           72s

Copy to Clipboard

Toggle word wrap

Confirm that your deployment is in progress by running the following command:

oc rollout status -n ai-demo deployment ovms-resnet50-predictor

$ oc rollout status -n ai-demo deployment ovms-resnet50-predictor

Copy to Clipboard

Toggle word wrap

Example output

deployment "ovms-resnet50-predictor" successfully rolled out

deployment "ovms-resnet50-predictor" successfully rolled out

Copy to Clipboard

Toggle word wrap

Check that the AI model workload pod is deployed in your custom namespace by running the following command:

oc get -n ai-demo pod

$ oc get -n ai-demo pod

Copy to Clipboard

Toggle word wrap

Example output

NAME                                       READY   STATUS    RESTARTS      AGE
ovms-resnet50-predictor-6fdb566b7f-bc9k5   2/2     Running   1 (72s ago)   74s

NAME                                       READY   STATUS    RESTARTS      AGE
ovms-resnet50-predictor-6fdb566b7f-bc9k5   2/2     Running   1 (72s ago)   74s

Copy to Clipboard

Toggle word wrap

Check for the service KServe created by running the following command:

oc get svc -n ai-demo

$ oc get svc -n ai-demo

Copy to Clipboard

Toggle word wrap

Example output

NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
ovms-resnet50-predictor   ClusterIP   None         <none>        80/TCP    119s

NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
ovms-resnet50-predictor   ClusterIP   None         <none>        80/TCP    119s

Copy to Clipboard

Toggle word wrap

Next step

Create a Route object so that your applications can reach the MicroShift node.

Additional resources

InferenceService (Red Hat OpenShift AI documentation)

1.10. Creating a route to use for AI queries in MicroShift
Copia collegamento

Create a route so that your AI model can receive queries and give output. You can either use the oc expose svc command or create a definition in a YAML file and apply it.

Prerequisites

You have root user access to your machine.
The OpenShift CLI (oc) is installed.

Procedure

Create a route using the following command:

oc expose svc -n ai-demo ovms-resnet50-predictor

$ oc expose svc -n ai-demo ovms-resnet50-predictor

Copy to Clipboard

Toggle word wrap

Example output

route.route.openshift.io/ovms-resnet50-predictor exposed

route.route.openshift.io/ovms-resnet50-predictor exposed

Copy to Clipboard

Toggle word wrap

Verification

Verify that the route you created exists by running the following command:

oc get route -n ai-demo

$ oc get route -n ai-demo

Copy to Clipboard

Toggle word wrap

Example output

NAME                      HOST                                               ADMITTED   SERVICE                   TLS
ovms-resnet50-predictor   ovms-resnet50-predictor-ai-demo.apps.example.com   True       ovms-resnet50-predictor

NAME                      HOST                                               ADMITTED   SERVICE                   TLS
ovms-resnet50-predictor   ovms-resnet50-predictor-ai-demo.apps.example.com   True       ovms-resnet50-predictor

Copy to Clipboard

Toggle word wrap

Additional resources

Configuring routes

1.11. About querying your AI model
Copia collegamento

Querying your model through the API is also called model inferencing. Model inferencing is most often used to retrieve information, automate tasks, make predictions, provide data insights, or perform actions.

In general, queries must be constructed using a format compatible with the AI model being used. A model-serving runtime formats queries automatically. The model processes the query according to the underlying training and data, then provides an output. The output is expected to align with the purpose of the model itself, whether that be to give an answer, make a prediction, or perform a task.

The following examples outline general steps to make sure your model is ready for inferencing, and what you might expect in a query output from the serving runtime.

1.11.1. Verifying that your AI model is accessible
Copia collegamento

Before querying the model through the API, you can check to be certain that the model is accessible and ready to provide answers based on the connected data. The following examples continue with the OpenVINO Model Server.

Prerequisites

You configured the AI model-serving runtime.
You uploaded your AI model to MicroShift.
MicroShift is running.
You installed OpenShift CLI (oc).

Procedure

Get the IP address of the MicroShift node and assign it to the IP variable as the following example command shows:
```
IP=$(oc get nodes -o json | jq -r '.items[0].status.addresses[0].address')
```
```
$ IP=$(oc get nodes -o json | jq -r '.items[0].status.addresses[0].address')
```
Copy to Clipboard Toggle word wrap
Identify the name of the route you created by running the following command:
```
oc get route -n ai-test <route_name> -o yaml
```
```
$ oc get route -n ai-test <route_name> -o yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <route_name> with the actual name of your route.
Extract and assign the HOST value of the route to the DOMAIN variable by running the following command:
```
DOMAIN=$(oc get route -n ai-test <route_name> -o=jsonpath="{ .status.ingress[0].host }") 
```
```
DOMAIN=$(oc get route -n ai-test <route_name> -o=jsonpath="{ .status.ingress[0].host }") 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <route_name> with the actual name of your route.

Enable data transfer from the route to the MicroShift IP address by running the following command:

curl -i "${DOMAIN}/v2/models/ovms-resnet50/ready" --connect-to "${DOMAIN}::${IP}:"

$ curl -i "${DOMAIN}/v2/models/ovms-resnet50/ready" --connect-to "${DOMAIN}::${IP}:"

1

Copy to Clipboard

Toggle word wrap

1: Instead of using the --connect-to "${DOMAIN}::${IP}:" flag, you can also use real DNS, or add the IP address and the domain to the /etc/hosts file.

Example output

HTTP/1.1 200 OK
content-type: application/json
date: Wed, 12 Mar 2025 16:01:32 GMT
content-length: 0
set-cookie: 56bb4b6df4f80f0b59f56aa0a5a91c1a=4af1408b4a1c40925456f73033d4a7d1; path=/; HttpOnly

HTTP/1.1 200 OK
content-type: application/json
date: Wed, 12 Mar 2025 16:01:32 GMT
content-length: 0
set-cookie: 56bb4b6df4f80f0b59f56aa0a5a91c1a=4af1408b4a1c40925456f73033d4a7d1; path=/; HttpOnly

Copy to Clipboard

Toggle word wrap

Query the model metadata by running the following command:

curl "${DOMAIN}/v2/models/ovms-resnet50" --connect-to "${DOMAIN}::${IP}:"

$ curl "${DOMAIN}/v2/models/ovms-resnet50" --connect-to "${DOMAIN}::${IP}:"

Copy to Clipboard

Toggle word wrap

Example output

{"name":"ovms-resnet50","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]

{"name":"ovms-resnet50","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]

Copy to Clipboard

Toggle word wrap

Next steps

Verify that your model is ready for inferencing.
Query the model.
Verify the model response.
Optional: Get the model server metrics.

1.11.2. Getting your AI model ready for inference
Copia collegamento

Before querying your AI model through the API, get the model ready to provide answers based on the training data. The following examples continue with the OVMS model.

Prerequisites

MicroShift is running.
You have the xxd utility, which is part of the vim-common package.
You configured the model-serving runtime.
You uploaded your AI model to MicroShift.

Procedure

Download an image of a bee from the OpenVINO Model Server examples by running the following command:

curl -O https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/bee.jpeg

$ curl -O https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/bee.jpeg

Copy to Clipboard

Toggle word wrap

Create the request data by running the following script:

IMAGE=./bee.jpeg
REQ=./request.json

# Add an inference header
echo -n '{"inputs" : [{"name": "0", "shape": [1], "datatype": "BYTES"}]}' > "${REQ}"
# Get the size of the inference header 
HEADER_LEN="$(stat -c %s "${REQ}")"
# Add size of the data (image) in binary format (4 bytes, little endian) 
printf "%08X" $(stat --format=%s "${IMAGE}") | sed 's/\(..\)/\1\n/g' | tac | tr -d '\n' | xxd -r -p >> "${REQ}"
# Add the data, that is, append the image to the request file
cat "${IMAGE}" >> "${REQ}"

IMAGE=./bee.jpeg
REQ=./request.json

# Add an inference header
echo -n '{"inputs" : [{"name": "0", "shape": [1], "datatype": "BYTES"}]}' > "${REQ}"
# Get the size of the inference header

1


HEADER_LEN="$(stat -c %s "${REQ}")"
# Add size of the data (image) in binary format (4 bytes, little endian)

2


printf "%08X" $(stat --format=%s "${IMAGE}") | sed 's/\(..\)/\1\n/g' | tac | tr -d '\n' | xxd -r -p >> "${REQ}"
# Add the data, that is, append the image to the request file
cat "${IMAGE}" >> "${REQ}"

Copy to Clipboard

Toggle word wrap

1: The inference header size must be passed to OpenVINO Model Server later in the form of an HTTP header.
2: The OpenVINO Model Server requires 4 bytes in little endian byte order.

1.11.3. Querying your AI model
Copia collegamento

Make an inference request against the AI model server that is using the ovms-resnet50 model.

Prerequisites

MicroShift is running.
You configured the model-serving runtime.
You uploaded your AI model to MicroShift.

Procedure

Make an inference request against the model server that is using the ovms-resnet50 model by running the following command:

curl \
    --data-binary "@./request.json" \
    --header "Inference-Header-Content-Length: ${HEADER_LEN}" \
    "${DOMAIN}/v2/models/ovms-resnet50/infer" \
    --connect-to "${DOMAIN}::${IP}:" > response.json

$ curl \
    --data-binary "@./request.json" \
    --header "Inference-Header-Content-Length: ${HEADER_LEN}" \
    "${DOMAIN}/v2/models/ovms-resnet50/infer" \
    --connect-to "${DOMAIN}::${IP}:" > response.json

Copy to Clipboard

Toggle word wrap

Example inferencing output, saved to a response.json

{
    "model_name": "ovms-resnet50",
    "model_version": "1",
    "outputs": [{
            "name": "1463",
            "shape": [1, 1000],
            "datatype": "FP32",
            "data": [ ....... ] 
        }]
}

{
    "model_name": "ovms-resnet50",
    "model_version": "1",
    "outputs": [{
            "name": "1463",
            "shape": [1, 1000],
            "datatype": "FP32",
            "data": [ ....... ]

1


        }]
}

Copy to Clipboard

Toggle word wrap

1: The contents of .outputs[0].data were omitted from the example for brevity.

Verification

To determine the model’s prediction, get the index of the highest element in the .outputs[0].data to determine the model’s predicted value by using the following Python script:
```
import json
with open('response.json') as f:
    response = json.load(f)
data = response["outputs"][0]["data"]
argmax = data.index(max(data))
print(argmax)
```
```
import json
with open('response.json') as f:
    response = json.load(f)
data = response["outputs"][0]["data"]
argmax = data.index(max(data))
print(argmax)
```
Copy to Clipboard Toggle word wrap
Example output
```
309 
```
```
309 
```
1
Copy to Clipboard Toggle word wrap
1
In this example, the element labeled 309 is the model’s response.
Validate the output against resnet’s input data, for example:
```
../../../../demos/common/static/images/bee.jpeg 309
```
```
../../../../demos/common/static/images/bee.jpeg 309
```
Copy to Clipboard Toggle word wrap

Next steps

Optional. Query the AI model using other images available in the resnet input data.

1.11.4. Getting the model-server metrics
Copia collegamento

After making a query, you can get the model server’s metrics to identify bottlenecks, optimize resource allocation, and ensure efficient infrastructure utilization.

Note

You can alternatively configure Open Telemetry for MicroShift to get model-server metrics. See "Adding Open Telemetry to an InferenceService custom resource" for more information.

Prerequisites

MicroShift is running.
There have been enough queries to provide the metrics data you want to see.

Procedure

Get the Prometheus-format metrics of the model server by making a request on the /metrics endpoint by running the following command:

curl "${DOMAIN}/metrics" --connect-to "${DOMAIN}::${IP}:"

$ curl "${DOMAIN}/metrics" --connect-to "${DOMAIN}::${IP}:"

Copy to Clipboard

Toggle word wrap

Partial example output

HELP ovms_requests_success Number of successful requests to a model or a DAG.
TYPE ovms_requests_success counter

# HELP ovms_requests_success Number of successful requests to a model or a DAG.
# TYPE ovms_requests_success counter
ovms_requests_success{api="KServe",interface="REST",method="ModelReady",name="ovms-resnet50"} 4
ovms_requests_success{api="KServe",interface="REST",method="ModelMetadata",name="ovms-resnet50",version="1"} 1

Copy to Clipboard

Toggle word wrap

1.12. Overriding KServe configuration
Copia collegamento

If you want to override KServe settings to customize your model-serving environment, you can follow the general steps for your operating system.

Option 1

Make a copy of the existing ConfigMap file, inferenceservice-config, in the redhat-ods-applications namespace.
Edit the settings you want to change.
Overwrite the existing ConfigMap object.
Restart KServe by either by deleting the pod or scaling the Deployment pod parameter down to 0 and then back up to 1.

Option 2

Copy the ConfigMap file, /usr/lib/microshift/manifests.d/010-microshift-ai-model-serving-kserve/inferenceservice-config-microshift-patch.yaml.
Edit the settings you want to change.
Apply the ConfigMap object.
Restart KServe by either by deleting the pod or scaling the Deployment pod parameter down to 0 and then back up to 1.

For RHEL for Edge and image mode for RHEL systems

Create a new manifest with the ConfigMap file, based on either of the /usr/lib/microshift/manifests.d/010-microshift-ai-model-serving-kserve/inferenceservice-config-microshift-patch.yaml or inferenceservice-config files, in the redhat-ods-applications namespace.
Ensure that the new manifest is placed in the /usr/lib/microshift/manifests.d/ directory. Staring with prefix 011 is recommended so that your manifest is applied after the /usr/lib/microshift/manifests.d/010-microshift-ai-model-serving-kserve/ directory contents.

Questo contenuto non è disponibile nella lingua selezionata.

1.1. How Red Hat OpenShift AI works in MicroShift
Copia collegamento

1.2. Workflow for using Red Hat OpenShift AI Self-Managed with MicroShift
Copia collegamento

1.3. Installing the Red Hat OpenShift AI RPM
Copia collegamento

1.4. Creating a namespace for your AI model on MicroShift
Copia collegamento

1.5. Packaging your AI model into an OCI image
Copia collegamento

1.6. Serving AI models on MicroShift
Copia collegamento

1.6.1. Supported Red Hat OpenShift AI Self-Managed custom resource definitions
Copia collegamento

1.6.2. Supported Red Hat OpenShift AI Self-Managed model-serving runtimes
Copia collegamento

1.7. Creating a ServingRuntime CR for use in MicroShift
Copia collegamento

1.8. Creating an InferenceService custom resource
Copia collegamento

1.8.1. Exporting model-server metrics by using Open Telemetry
Copia collegamento

1.8.2. More InferenceService CR options
Copia collegamento

1.9. Verifying that the model-serving runtime is ready
Copia collegamento

1.10. Creating a route to use for AI queries in MicroShift
Copia collegamento

1.11. About querying your AI model
Copia collegamento

1.11.1. Verifying that your AI model is accessible
Copia collegamento

1.11.2. Getting your AI model ready for inference
Copia collegamento

1.11.3. Querying your AI model
Copia collegamento

1.11.4. Getting the model-server metrics
Copia collegamento

1.12. Overriding KServe configuration
Copia collegamento

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 1. Using Red Hat OpenShift AI with MicroShift

1.1. How Red Hat OpenShift AI works in MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.2. Workflow for using Red Hat OpenShift AI Self-Managed with MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.3. Installing the Red Hat OpenShift AI RPMCopia collegamentoCollegamento copiato negli appunti!

1.4. Creating a namespace for your AI model on MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.5. Packaging your AI model into an OCI imageCopia collegamentoCollegamento copiato negli appunti!

1.6. Serving AI models on MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.6.1. Supported Red Hat OpenShift AI Self-Managed custom resource definitionsCopia collegamentoCollegamento copiato negli appunti!

1.6.2. Supported Red Hat OpenShift AI Self-Managed model-serving runtimesCopia collegamentoCollegamento copiato negli appunti!

1.7. Creating a ServingRuntime CR for use in MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.8. Creating an InferenceService custom resourceCopia collegamentoCollegamento copiato negli appunti!

1.8.1. Exporting model-server metrics by using Open TelemetryCopia collegamentoCollegamento copiato negli appunti!

1.8.2. More InferenceService CR optionsCopia collegamentoCollegamento copiato negli appunti!

1.9. Verifying that the model-serving runtime is readyCopia collegamentoCollegamento copiato negli appunti!

1.10. Creating a route to use for AI queries in MicroShiftCopia collegamentoCollegamento copiato negli appunti!

1.11. About querying your AI modelCopia collegamentoCollegamento copiato negli appunti!

1.11.1. Verifying that your AI model is accessibleCopia collegamentoCollegamento copiato negli appunti!

1.11.2. Getting your AI model ready for inferenceCopia collegamentoCollegamento copiato negli appunti!

1.11.3. Querying your AI modelCopia collegamentoCollegamento copiato negli appunti!

1.11.4. Getting the model-server metricsCopia collegamentoCollegamento copiato negli appunti!

1.12. Overriding KServe configurationCopia collegamentoCollegamento copiato negli appunti!

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. How Red Hat OpenShift AI works in MicroShift
Copia collegamento

1.2. Workflow for using Red Hat OpenShift AI Self-Managed with MicroShift
Copia collegamento

1.3. Installing the Red Hat OpenShift AI RPM
Copia collegamento

1.4. Creating a namespace for your AI model on MicroShift
Copia collegamento

1.5. Packaging your AI model into an OCI image
Copia collegamento

1.6. Serving AI models on MicroShift
Copia collegamento

1.6.1. Supported Red Hat OpenShift AI Self-Managed custom resource definitions
Copia collegamento

1.6.2. Supported Red Hat OpenShift AI Self-Managed model-serving runtimes
Copia collegamento

1.7. Creating a ServingRuntime CR for use in MicroShift
Copia collegamento

1.8. Creating an InferenceService custom resource
Copia collegamento

1.8.1. Exporting model-server metrics by using Open Telemetry
Copia collegamento

1.8.2. More InferenceService CR options
Copia collegamento

1.9. Verifying that the model-serving runtime is ready
Copia collegamento

1.10. Creating a route to use for AI queries in MicroShift
Copia collegamento

1.11. About querying your AI model
Copia collegamento

1.11.1. Verifying that your AI model is accessible
Copia collegamento

1.11.2. Getting your AI model ready for inference
Copia collegamento

1.11.3. Querying your AI model
Copia collegamento

1.11.4. Getting the model-server metrics
Copia collegamento

1.12. Overriding KServe configuration
Copia collegamento