Chapter 6. Serving large language models


For deploying large language models (LLMs), Red Hat OpenShift AI includes a single model serving platform that is based on the KServe component. Because each model is deployed from its own model server, the single model serving platform helps you to deploy, monitor, scale, and maintain LLMs.

6.1. About the single model serving platform

For serving large language models (LLMs), Red Hat OpenShift AI includes a single model serving platform that is based on KServe. The single model serving platform consists of the following components:

  • KServe: Kubernetes custom resource definition (CRD) that orchestrates model serving for all types of models. It includes model-serving runtimes that implement the loading of given types of model servers. KServe handles the lifecycle of the deployment object, storage access, and networking setup.
  • Red Hat OpenShift Serverless: Cloud-native development model that allows for serverless deployments of models. OpenShift Serverless is based on the open source Knative project.
  • Red Hat OpenShift Service Mesh: Service mesh networking layer that manages traffic flows and enforces access policies. OpenShift Service Mesh is based on the open source Istio project.

To install the single model serving platform, you have the following options:

Automated installation
If you have not already created a ServiceMeshControlPlane or KNativeServing resource on your OpenShift cluster, you can configure the Red Hat OpenShift Data Science Operator to install KServe and its dependencies.
Manual installation
If you have already created a ServiceMeshControlPlane or KNativeServing resource on your OpenShift cluster, you cannot configure the Red Hat OpenShift Data Science Operator to install KServe and its dependencies. In this situation, you must install KServe manually.

When you have installed KServe, you can use the OpenShift AI dashboard to deploy models using a Caikit and Text Generation Inference Server (TGIS)-based runtime that is included in OpenShift AI. You can also configure monitoring for the platform and use Prometheus to scrape the available metrics.

6.2. Configuring automated installation of KServe

If you have not already created a ServiceMeshControlPlane or KNativeServing resource on your OpenShift cluster, you can configure the Red Hat OpenShift Data Science Operator to install KServe and its dependencies.

Important

If you have created a ServiceMeshControlPlane or KNativeServing resource on your cluster, the Red Hat OpenShift Data Science Operator cannot install KServe and its dependencies and the installation does not proceed. In this situation, you must follow the manual installation instructions to install KServe.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • Your cluster has a node with 4 CPUs and 16 GB memory.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You have installed the Red Hat OpenShift Service Mesh Operator and dependent Operators.

    Note

    To enable automated installation of KServe, install only the required Operators for Red Hat OpenShift Service Mesh. Do not perform any additional configuration or create a ServiceMeshControlPlane resource.

  • You have installed the Red Hat OpenShift Serverless Operator.

    Note

    To enable automated installation of KServe, install only the Red Hat OpenShift Serverless Operator. Do not perform any additional configuration or create a KNativeServing resource.

  • You have installed the Red Hat OpenShift Data Science Operator and created a DataScienceCluster object.

Procedure

  1. Log in to the OpenShift web console as a cluster administrator.
  2. In the web console, click Operators Installed Operators and then click the Red Hat OpenShift Data Science Operator.
  3. Install OpenShift Service Mesh as follows:

    1. Click the DSC Initialization tab.
    2. Click the default-dsci object.
    3. Click the YAML tab.
    4. In the spec section, validate that the value of the managementState field for the serviceMesh component is set to Managed, as shown:

      spec:
       applicationsNamespace: redhat-ods-applications
       monitoring:
         managementState: Managed
         namespace: redhat-ods-monitoring
       serviceMesh:
         controlPlane:
           metricsCollection: Istio
           name: data-science-smcp
           namespace: istio-system
         managementState: Managed
      Note

      Do not change the istio-system namespace that is specified for the serviceMesh component by default. Other namespace values are not supported.

    5. Click Save.

      Based on the configuration you added to the DSCInitialization object, the Red Hat OpenShift Data Science Operator installs OpenShift Service Mesh.

  4. Install both KServe and OpenShift Serverless as follows:

    1. In the web console, click Operators Installed Operators and then click the Red Hat OpenShift Data Science Operator.
    2. Click the Data Science Cluster tab.
    3. Click the default-dsc DSC object.
    4. Click the YAML tab.
    5. In the spec.components section, configure the kserve component as shown.

      spec:
       components:
         kserve:
           managementState: Managed
           serving:
             ingressGateway:
               certificate:
                 secretName: knative-serving-cert
                 type: SelfSigned
             managementState: Managed
             name: knative-serving
    6. Click Save.

      The preceding configuration creates an ingress gateway for OpenShift Serverless to receive traffic from OpenShift Service Mesh. In this configuration, observe the following details:

      • The configuration shown generates a self-signed certificate to secure incoming traffic to your OpenShift cluster and stores the certificate in the knative-serving-cert secret that is specified in the secretName field. To provide your own certificate, update the value of the secretName field to specify your secret name and change the value of the type field to Provided.

        Note

        If you provide your own certificate, the certificate must specify the domain name used by the ingress controller of your OpenShift cluster. You can check this value by running the following command:

        $ oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'

      • You must set the value of the managementState field to Managed for both the kserve and serving components. Setting kserve.managementState to Managed triggers automated installation of KServe. Setting serving.managementState to Managed triggers automated installation of OpenShift Serverless. However, installation of OpenShift Serverless will not be triggered if kserve.managementState is not also set to Managed.

Verification

  • Verify installation of OpenShift Service Mesh as follows:

    • In the web console, click Workloads Pods.
    • From the project list, select istio-system. This is the project in which OpenShift Service Mesh is installed.
    • Confirm that there are running pods for the service mesh control plane, ingress gateway, and egress gateway. These pods have the naming patterns shown in the following example:

      NAME                                      		  READY     STATUS    RESTARTS   AGE
      istio-egressgateway-7c46668687-fzsqj      	 	  1/1       Running   0          22h
      istio-ingressgateway-77f94d8f85-fhsp9      		  1/1       Running   0          22h
      istiod-data-science-smcp-cc8cfd9b8-2rkg4  		  1/1       Running   0          22h
  • Verify installation of OpenShift Serverless as follows:

    • In the web console, click Workloads Pods.
    • From the project list, select knative-serving. This is the project in which OpenShift Serverless is installed.
    • Confirm that there are numerous running pods in the knative-serving project, including activator, autoscaler, controller, and domain mapping pods, as well as pods for the Knative Istio controller (which controls the integration of OpenShift Serverless and OpenShift Service Mesh). An example is shown.

      NAME                                     	READY     STATUS    RESTARTS  AGE
      activator-7586f6f744-nvdlb               	2/2       Running   0         22h
      activator-7586f6f744-sd77w               	2/2       Running   0         22h
      autoscaler-764fdf5d45-p2v98             	2/2       Running   0         22h
      autoscaler-764fdf5d45-x7dc6              	2/2       Running   0         22h
      autoscaler-hpa-7c7c4cd96d-2lkzg          	1/1       Running   0         22h
      autoscaler-hpa-7c7c4cd96d-gks9j         	1/1       Running   0         22h
      controller-5fdfc9567c-6cj9d              	1/1       Running   0         22h
      controller-5fdfc9567c-bf5x7              	1/1       Running   0         22h
      domain-mapping-56ccd85968-2hjvp          	1/1       Running   0         22h
      domain-mapping-56ccd85968-lg6mw          	1/1       Running   0         22h
      domainmapping-webhook-769b88695c-gp2hk   	1/1       Running   0         22h
      domainmapping-webhook-769b88695c-npn8g   	1/1       Running   0         22h
      net-istio-controller-7dfc6f668c-jb4xk    	1/1       Running   0         22h
      net-istio-controller-7dfc6f668c-jxs5p    	1/1       Running   0         22h
      net-istio-webhook-66d8f75d6f-bgd5r       	1/1       Running   0         22h
      net-istio-webhook-66d8f75d6f-hld75      	1/1       Running   0         22h
      webhook-7d49878bc4-8xjbr                 	1/1       Running   0         22h
      webhook-7d49878bc4-s4xx4                 	1/1       Running   0         22h
  • Verify installation of KServe as follows:

    • In the web console, click Workloads Pods.
    • From the project list, select redhat-ods-applications.This is the project in which OpenShift AI components are installed, including KServe.
    • Confirm that the project includes a running pod for the KServe controller manager, similar to the following example:

      NAME                                          READY   STATUS    RESTARTS   AGE
      kserve-controller-manager-7fbb7bccd4-t4c5g    1/1     Running   0          22h
      odh-model-controller-6c4759cc9b-cftmk         1/1     Running   0          129m
      odh-model-controller-6c4759cc9b-ngj8b         1/1     Running   0          129m
      odh-model-controller-6c4759cc9b-vnhq5         1/1     Running   0          129m

6.3. Manually installing KServe

If you have already installed the Red Hat OpenShift Service Mesh Operator and created a ServiceMeshControlPlane resource or if you have installed the Red Hat OpenShift Serverless Operator and created a KNativeServing resource, the Red Hat OpenShift Data Science Operator cannot install KServe and its dependencies. In this situation, you must install KServe manually.

Important

The procedures in this section show how to perform a new installation of KServe and its dependencies and are intended as a complete installation and configuration reference. If you have already installed and configured OpenShift Service Mesh or OpenShift Serverless, you might not need to follow all steps. If you are unsure about what updates to apply to your existing configuration to use KServe, contact Red Hat Support.

6.3.1. Installing KServe dependencies

Before you install KServe, you must install and configure some dependencies. Specifically, you must create Red Hat OpenShift Service Mesh and Knative Serving instances and then configure secure gateways for Knative Serving.

6.3.1.1. Creating an OpenShift Service Mesh instance

The following procedure shows how to create a Red Hat OpenShift Service Mesh instance.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • Your cluster has a node with 4 CPUs and 16 GB memory.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You have installed the Red Hat OpenShift Service Mesh Operator and dependent Operators.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:

    $ oc login <openshift_cluster_url> -u <admin_username> -p <password>
  2. Create the required namespace for Red Hat OpenShift Service Mesh.

    $ oc create ns istio-system

    You see the following output:

    namespace/istio-system created
  3. Define a ServiceMeshControlPlane object in a YAML file named smcp.yaml with the following contents:

    apiVersion: maistra.io/v2
    kind: ServiceMeshControlPlane
    metadata:
      name: minimal
      namespace: istio-system
    spec:
      tracing:
        type: None
      addons:
        grafana:
          enabled: false
        kiali:
          name: kiali
          enabled: false
        prometheus:
          enabled: false
        jaeger:
          name: jaeger
      security:
        dataPlane:
          mtls: true
        identity:
          type: ThirdParty
      techPreview:
        meshConfig:
          defaultConfig:
            terminationDrainDuration: 35s
      gateways:
        ingress:
          service:
            metadata:
              labels:
                knative: ingressgateway
      proxy:
        networking:
          trafficControl:
            inbound:
              excludedPorts:
                - 8444
                - 8022

    For more information about the values in the YAML file, see the Service Mesh control plane configuration reference.

  4. Create the service mesh control plane.

    $ oc apply -f smcp.yaml

Verification

  • Verify creation of the service mesh instance as follows:

    • In the OpenShift CLI, enter the following command:

      $ oc get pods -n istio-system

      The preceding command lists all running pods in the istio-system project. This is the project in which OpenShift Service Mesh is installed.

    • Confirm that there are running pods for the service mesh control plane, ingress gateway, and egress gateway. These pods have the following naming patterns:

      NAME                                          READY   STATUS   	  RESTARTS    AGE
      istio-egressgateway-7c46668687-fzsqj          1/1     Running     0           22h
      istio-ingressgateway-77f94d8f85-fhsp9         1/1     Running     0           22h
      istiod-data-science-smcp-cc8cfd9b8-2rkg4      1/1     Running     0           22h

6.3.1.2. Creating a Knative Serving instance

The following procedure shows how to install Knative Serving and then create an instance.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • Your cluster has a node with 4 CPUs and 16 GB memory.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You have created a Red Hat OpenShift Service Mesh instance.
  • You have installed the Red Hat OpenShift Serverless Operator.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:

    $ oc login <openshift_cluster_url> -u <admin_username> -p <password>
  2. Check whether the required project (that is, namespace) for Knative Serving already exists.

    $ oc get ns knative-serving

    If the project exists, you see output similar to the following example:

    NAME              STATUS   AGE
    knative-serving   Active   4d20h
  3. If the knative-serving project doesn’t already exist, create it.

    $ oc create ns knative-serving

    You see the following output:

    namespace/knative-serving created
  4. Define a ServiceMeshMember object in a YAML file called default-smm.yaml with the following contents:

    apiVersion: maistra.io/v1
    kind: ServiceMeshMember
    metadata:
      name: default
      namespace: knative-serving
    spec:
      controlPlaneRef:
        namespace: istio-system
        name: minimal
  5. Create the ServiceMeshMember object in the istio-system namespace.

    $ oc apply -f default-smm.yaml

    You see the following output:

    servicemeshmember.maistra.io/default created
  6. Define a KnativeServing object in a YAML file called knativeserving-istio.yaml with the following contents:

    apiVersion: operator.knative.dev/v1beta1
    kind: KnativeServing
    metadata:
      name: knative-serving
      namespace: knative-serving
      annotations:
        serverless.openshift.io/default-enable-http2: "true"
    spec:
      deployments:
        - annotations:
            sidecar.istio.io/inject: "true" 1
            sidecar.istio.io/rewriteAppHTTPProbers: "true 2
          name: activator
        - annotations:
            sidecar.istio.io/inject: "true"
            sidecar.istio.io/rewriteAppHTTPProbers: "true"
          name: autoscaler
      ingress:
        istio:
          enabled: true
      config:
        features:
          kubernetes.podspec-affinity: enabled
          kubernetes.podspec-nodeselector: enabled
          kubernetes.podspec-tolerations: enabled

    The preceding file defines a custom resource (CR) for a KnativeServing object. The CR also adds the following actions to each of the activator and autoscaler pods:

    1
    Injects an Isitio sidecar to the pod. This makes the pod part of the service mesh.
    2
    Enables the Istio sidecar to rewrite the HTTP liveness and readiness probes for the pod.
  7. Create the KnativeServing object in the specified knative-serving namespace.

    $ oc apply -f knativeserving-istio.yaml

    You see the following output:

    knativeserving.operator.knative.dev/knative-serving created

Verification

  • Review the default ServiceMeshMemberRoll object in the istio-system namespace.

    $ oc describe smmr default -n istio-system

    In the description of the ServiceMeshMemberRoll object, locate the Status.Members field and confirm that it includes the knative-serving namespace.

  • Verify creation of the Knative Serving instance as follows:

    • In the OpenShift CLI, enter the following command:

      $ oc get pods -n knative-serving

      The preceding command lists all running pods in the knative-serving project. This is the project in which you created the Knative Serving instance.

    • Confirm that there are numerous running pods in the knative-serving project, including activator, autoscaler, controller, and domain mapping pods, as well as pods for the Knative Istio controller, which controls the integration of OpenShift Serverless and OpenShift Service Mesh. An example is shown.

      NAME                                     	READY       STATUS    	RESTARTS   	AGE
      activator-7586f6f744-nvdlb               	2/2         Running   	0          	22h
      activator-7586f6f744-sd77w               	2/2         Running   	0          	22h
      autoscaler-764fdf5d45-p2v98             	2/2         Running   	0          	22h
      autoscaler-764fdf5d45-x7dc6              	2/2         Running   	0          	22h
      autoscaler-hpa-7c7c4cd96d-2lkzg          	1/1         Running   	0          	22h
      autoscaler-hpa-7c7c4cd96d-gks9j         	1/1         Running   	0          	22h
      controller-5fdfc9567c-6cj9d              	1/1         Running   	0          	22h
      controller-5fdfc9567c-bf5x7              	1/1         Running   	0          	22h
      domain-mapping-56ccd85968-2hjvp          	1/1         Running   	0          	22h
      domain-mapping-56ccd85968-lg6mw          	1/1         Running   	0          	22h
      domainmapping-webhook-769b88695c-gp2hk   	1/1         Running     0          	22h
      domainmapping-webhook-769b88695c-npn8g   	1/1         Running   	0          	22h
      net-istio-controller-7dfc6f668c-jb4xk    	1/1         Running   	0          	22h
      net-istio-controller-7dfc6f668c-jxs5p    	1/1         Running   	0          	22h
      net-istio-webhook-66d8f75d6f-bgd5r       	1/1         Running   	0          	22h
      net-istio-webhook-66d8f75d6f-hld75      	1/1         Running   	0          	22h
      webhook-7d49878bc4-8xjbr                 	1/1         Running   	0          	22h
      webhook-7d49878bc4-s4xx4                 	1/1         Running   	0          	22h

6.3.1.3. Creating secure gateways for Knative Serving

To secure traffic between your Knative Serving instance and the service mesh, you must create secure gateways for your Knative Serving instance.

The following procedure shows how to use OpenSSL to generate a wildcard certificate and key and then use them to create local and ingress gateways for Knative Serving.

Important

If you have your own wildcard certificate and key to specify when configuring the gateways, you can skip to step 11 of this procedure.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You have created a Red Hat OpenShift Service Mesh instance.
  • You have created a Knative Serving instance.
  • If you intend to generate a wildcard certificate and key, you have downloaded and installed OpenSSL.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:

    $ oc login <openshift_cluster_url> -u <admin_username> -p <password>
    Important

    If you have your own wildcard certificate and key to specify when configuring the gateways, skip to step 11 of this procedure.

  2. Set environment variables to define base directories for generation of a wildcard certificate and key for the gateways.

    $ export BASE_DIR=/tmp/kserve
    $ export BASE_CERT_DIR=${BASE_DIR}/certs
  3. Set an environment variable to define the common name used by the ingress controller of your OpenShift cluster.

    $ export COMMON_NAME=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}' | awk -F'.' '{print $(NF-1)"."$NF}')
  4. Set an environment variable to define the domain name used by the ingress controller of your OpenShift cluster.

    $ export DOMAIN_NAME=$(oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}')
  5. Create the required base directories for the certificate generation, based on the environment variables that you previously set.

    $ mkdir ${BASE_DIR}
    $ mkdir ${BASE_CERT_DIR}
  6. Create the OpenSSL configuration for generation of a wildcard certificate.

    $ cat <<EOF> ${BASE_DIR}/openssl-san.config
    [ req ]
    distinguished_name = req
    [ san ]
    subjectAltName = DNS:*.${DOMAIN_NAME}
    EOF
  7. Generate a root certificate.

    $ openssl req -x509 -sha256 -nodes -days 3650 -newkey rsa:2048 \
    -subj "/O=Example Inc./CN=${COMMON_NAME}" \
    -keyout $BASE_DIR/root.key \
    -out $BASE_DIR/root.crt
  8. Generate a wildcard certificate signed by the root certificate.

    $ openssl req -x509 -newkey rsa:2048 \
    -sha256 -days 3560 -nodes \
    -subj "/CN=${COMMON_NAME}/O=Example Inc." \
    -extensions san -config ${BASE_DIR}/openssl-san.config \
    -CA $BASE_DIR/root.crt \
    -CAkey $BASE_DIR/root.key \
    -keyout $BASE_DIR/wildcard.key  \
    -out $BASE_DIR/wildcard.crt
    
    $ openssl x509 -in ${BASE_DIR}/wildcard.crt -text
  9. Verify the wildcard certificate.

    $ openssl verify -CAfile ${BASE_DIR}/root.crt ${BASE_DIR}/wildcard.crt
  10. Export the wildcard key and certificate that were created by the script to new environment variables.

    $ export TARGET_CUSTOM_CERT=${BASE_CERT_DIR}/wildcard.crt
    $ export TARGET_CUSTOM_KEY=${BASE_CERT_DIR}/wildcard.key
  11. Optional: To export your own wildcard key and certificate to new environment variables, enter the following commands:

    $ export TARGET_CUSTOM_CERT=<path_to_certificate>
    $ export TARGET_CUSTOM_KEY=<path_to_key>
    Note

    In the certificate that you provide, you must specify the domain name used by the ingress controller of your OpenShift cluster. You can check this value by running the following command:

    $ oc get ingresses.config.openshift.io cluster -o jsonpath='{.spec.domain}'

  12. Create a TLS secret in the istio-system namespace using the environment variables that you set for the wildcard certificate and key.

    $ oc create secret tls wildcard-certs --cert=${TARGET_CUSTOM_CERT} --key=${TARGET_CUSTOM_KEY} -n istio-system
  13. Create a gateways.yaml YAML file with the following contents:

    apiVersion: v1
    kind: Service 1
    metadata:
      labels:
        experimental.istio.io/disable-gateway-port-translation: "true"
      name: knative-local-gateway
      namespace: istio-system
    spec:
      ports:
        - name: http2
          port: 80
          protocol: TCP
          targetPort: 8081
      selector:
        knative: ingressgateway
      type: ClusterIP
    ---
    apiVersion: networking.istio.io/v1beta1
    kind: Gateway
    metadata:
      name: knative-ingress-gateway 2
      namespace: knative-serving
    spec:
      selector:
        knative: ingressgateway
      servers:
        - hosts:
            - '*'
          port:
            name: https
            number: 443
            protocol: HTTPS
          tls:
            credentialName: wildcard-certs
            mode: SIMPLE
    ---
    apiVersion: networking.istio.io/v1beta1
    kind: Gateway
    metadata:
     name: knative-local-gateway 3
     namespace: knative-serving
    spec:
     selector:
       knative: ingressgateway
     servers:
       - port:
           number: 8081
           name: https
           protocol: HTTPS
         tls:
           mode: ISTIO_MUTUAL
         hosts:
           - "*"
    1
    Defines a service in the istio-system namespace for the Knative local gateway.
    2
    Defines an ingress gateway in the knative-serving namespace. The gateway uses the TLS secret you created earlier in this procedure. The ingress gateway handles external traffic to Knative.
    3
    Defines a local gateway for Knative in the knative-serving namespace.
  14. Apply the gateways.yaml file to create the defined resources.

    $ oc apply -f gateways.yaml

    You see the following output:

    service/knative-local-gateway created
    gateway.networking.istio.io/knative-ingress-gateway created
    gateway.networking.istio.io/knative-local-gateway created

Verification

  • Review the gateways that you created.

    $ oc get gateway --all-namespaces

    Confirm that you see the local and ingress gateways that you created in the knative-serving namespace, as shown in the following example:

    NAMESPACE         	NAME                      	AGE
    knative-serving   	knative-ingress-gateway   	69s
    knative-serving     knative-local-gateway     	2m

6.3.2. Installing KServe

To complete manual installation of KServe, you must install the Red Hat OpenShift Data Science Operator. Then, you can configure the Operator to install KServe.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • Your cluster has a node with 4 CPUs and 16 GB memory.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You have created a Red Hat OpenShift Service Mesh instance.
  • You have created a Knative Serving instance.
  • You have created secure gateways for Knative Serving.
  • You have installed the Red Hat OpenShift Data Science Operator and created a DataScienceCluster object.

Procedure

  1. Log in to the OpenShift web console as a cluster administrator.
  2. In the web console, click Operators Installed Operators and then click the Red Hat OpenShift Data Science Operator.
  3. For installation of KServe, configure the OpenShift Service Mesh component as follows:

    1. Click the DSC Initialization tab.
    2. Click the default-dsci object.
    3. Click the YAML tab.
    4. In the spec section, add and configure the serviceMesh component as shown:

      spec:
       serviceMesh:
         managementState: Unmanaged
    5. Click Save.
  4. For installation of KServe, configure the KServe and OpenShift Serverless components as follows:

    1. In the web console, click Operators Installed Operators and then click the Red Hat OpenShift Data Science Operator.
    2. Click the Data Science Cluster tab.
    3. Click the default-dsc DSC object.
    4. Click the YAML tab.
    5. In the spec.components section, configure the kserve component as shown:

      spec:
       components:
         kserve:
           managementState: Managed
    6. Within the kserve component, add the serving component, and configure it as shown:

      spec:
       components:
         kserve:
           managementState: Managed
           serving:
             managementState: Unmanaged
    7. Click Save.

6.4. Deploying models by using the single model serving platform

You can deploy trained models on OpenShift AI to test and implement them into intelligent applications. Deploying a model makes it available as a service that you can access using an API. This enables you to return predictions based on data inputs.

For serving large language models (LLMs), Red Hat OpenShift AI includes a single model serving platform that is based on KServe. Because each model is deployed from its own model server, the single model serving platform helps you to deploy, monitor, scale, and maintain LLMs.

Important

The single model serving platform does not support self-signed certificates. Therefore, to deploy a model from S3 storage, you need to follow a workaround to disable SSL authentication. For more information, see the following Red Hat Solution article: How to skip the validation of SSL for KServe.

6.4.1. Enabling the single model serving platform

When you have installed KServe, you can use the Red Hat OpenShift AI dashboard to enable the single model serving platform that uses it. You can also use the dashboard to enable the Caikit-TGIS model-serving runtime that this platform uses.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the admin group (for example, oai-admins) in OpenShift.
  • You have installed KServe.

Procedure

  1. Enable the single model serving platform as follows:

    1. In the left menu, click Settings Cluster settings.
    2. Locate the Model serving platforms section.
    3. To enable the single model serving platform for projects, select the Single model serving platform checkbox.
    4. Click Save changes.
  2. Enable the Caikit-TGIS runtime as follows:

    1. In the left menu of the OpenShift AI dashboard, click Settings Serving runtimes.
    2. On the Serving runtimes page, set the Caikit TGIS ServingRuntime for KServe runtime to Enabled.

      The single model serving platform is now available for model deployments.

6.4.2. Deploying models on the single model serving platform

When you have enabled the single model serving platform (which uses the KServe component of OpenShift AI), you can enable the Caikit-TGIS runtime that this platform uses and start to deploy models on the platform.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, oai-users or oai-admins ) in OpenShift.
  • You have installed KServe.
  • You have enabled the single model serving platform.
  • You have created a data science project.
  • To use the Caikit-TGIS runtime, you have converted your model to Caikit format. For an example, see Converting Hugging Face Hub models to Caikit format in the caikit-tgis-serving repository.
  • You know the folder path for the data connection that you want the model to access.
  • If you want to use graphics processing units (GPUs) with your model server, you have enabled GPU support in OpenShift AI. See Enabling GPU support in OpenShift AI.

Procedure

  1. Enable the Caikit-TGIS runtime as follows:

    1. In the left menu of the OpenShift AI dashboard, click Settings Serving runtimes.
    2. On the Serving runtimes page, set the Caikit TGIS ServingRuntime for KServe runtime to Enabled.
  2. In the left menu, click Data Science Projects.
  3. Click the name of the project that you want to deploy a model in.
  4. In the Models and model servers section, perform one of the following actions:

    • If you see a ​​Single model serving platform tile, click Deploy model on the tile.
    • If you do not see any tiles, click the Deploy model button.

    The Deploy model dialog opens.

  5. Configure properties for deploying your model as follows:

    1. In the Model name field, enter a unique name for the model that you are deploying.
    2. In the Serving runtime field, select Caikit TGIS ServingRuntime for KServe.
    3. From the Model framework list, select caikit.
    4. In the Number of model replicas to deploy field, specify a value.
    5. From the Model server size list, select a value.
    6. To specify the location of your model, perform one of the following sets of actions:

      • To use an existing data connection

        1. Select Existing data connection.
        2. From the Name list, select a data connection that you previously defined.
        3. In the Folder path field, enter the folder path that contains the model in your specified data source.
      • To use a new data connection

        1. To define a new data connection that your model can access, select New data connection.
        2. In the Name field, enter a unique name for the data connection.
        3. In the Access key field, enter the access key ID for your S3-compatible object storage provider.
        4. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
        5. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
        6. In the Region field, enter the default region of your S3-compatible object storage account.
        7. In the Bucket field, enter the name of your S3-compatible object storage bucket.
        8. In the Folder path field, enter the folder path in your S3-compatible object storage that contains your data file.
    7. Click Deploy.

Verification

  • Confirm that the deployed model is shown in the Models and model servers section of your project, and on the Model Serving page of the dashboard with a check mark in the Status column.

6.4.3. Accessing the API endpoints for models deployed on the single model serving platform

When you deploy a model by using the single model serving platform, the model is available as a service that you can access using API requests. This enables you to return predictions based on data inputs. To use API requests to interact with your deployed model, you must know how to access the API endpoints that are available.

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • If you are using specialized OpenShift AI groups, you are part of the user group or admin group (for example, oai-users or oai-admins ) in OpenShift.
  • You have deployed a model by using the single model serving platform.

Procedure

  1. From the OpenShift AI dashboard, click Model Serving.
  2. From the Project list, select the project that you deployed a model in.
  3. In the Deployed models table, for the model that you want to access, copy the URL shown in the Inference endpoint column.
  4. Depending on what action you want to perform with the model (and if the model supports that action), add one of the following paths to the end of the inference endpoint URL:

    • :443/api/v1/task/server-streaming-text-generation
    • :443/api/v1/task/text-generation

    As indicated by the paths shown, the single model serving platform uses the HTTPS port of your OpenShift router (usually port 443) to serve external API requests.

  5. Use the endpoints to make API requests to your deployed model, as shown in the following example curl command:

    curl --json '{
        "model_id": "<model_name>",
        "inputs": "<query_text>"
    }' https://<inference_endpoint_url>:443/api/v1/task/server-streaming-text-generation

6.5. Configuring monitoring for the single model serving platform

The single model serving platform includes metrics for Caikit and TGIS. You can also configure monitoring for OpenShift Service Mesh. The service mesh metrics helps you to understand dependencies and traffic flow between components in the mesh. When you have configured monitoring, you can grant Prometheus access to scrape the available metrics.

Prerequisites

  • You have cluster administrator privileges for your OpenShift Container Platform cluster.
  • You have created OpenShift Service Mesh and Knative Serving instances and installed KServe.
  • You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
  • You are familiar with creating a config map for monitoring a user-defined workflow. You will perform similar steps in this procedure.
  • You are familiar with enabling monitoring for user-defined projects in OpenShift. You will perform similar steps in this procedure.
  • You have assigned the monitoring-rules-view role to users that will monitor metrics.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:

    $ oc login <openshift_cluster_url> -u <admin_username> -p <password>
  2. Define a ConfigMap object in a YAML file called uwm-cm-conf.yaml with the following contents:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: user-workload-monitoring-config
      namespace: openshift-user-workload-monitoring
    data:
      config.yaml: |
        prometheus:
          logLevel: debug
          retention: 15d

    The user-workload-monitoring-config object configures the components that monitor user-defined projects. Observe that the retention time is set to the recommended value of 15 days.

  3. Apply the configuration to create the user-workload-monitoring-config object.

    $ oc apply -f uwm-cm-conf.yaml
  4. Define another ConfigMap object in a YAML file called uwm-cm-enable.yaml with the following contents:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        enableUserWorkload: true

    The cluster-monitoring-config object enables monitoring for user-defined projects.

  5. Apply the configuration to create the cluster-monitoring-config object.

    $ oc apply -f uwm-cm-enable.yaml
  6. Create ServiceMonitor and PodMonitor objects to monitor metrics in the service mesh control plane as follows:

    1. Create an istiod-monitor.yaml YAML file with the following contents:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: istiod-monitor
        namespace: istio-system
      spec:
        targetLabels:
        - app
        selector:
          matchLabels:
            istio: pilot
        endpoints:
        - port: http-monitoring
          interval: 30s
    2. Deploy the ServiceMonitor CR in the specified istio-system namespace.

      $ oc apply -f istiod-monitor.yaml

      You see the following output:

      servicemonitor.monitoring.coreos.com/istiod-monitor created
    3. Create an istio-proxies-monitor.yaml YAML file with the following contents:

      apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        name: istio-proxies-monitor
        namespace: istio-system
      spec:
        selector:
          matchExpressions:
          - key: istio-prometheus-ignore
            operator: DoesNotExist
        podMetricsEndpoints:
        - path: /stats/prometheus
          interval: 30s
    4. Deploy the PodMonitor CR in the specified istio-system namespace.

      $ oc apply -f istio-proxies-monitor.yaml

      You see the following output:

      podmonitor.monitoring.coreos.com/istio-proxies-monitor created

6.6. Viewing metrics for the single model serving platform

When a cluster administrator has configured monitoring for the single model serving platform, non-admin users can use the OpenShift web console to view metrics.

Prerequisites

  • A cluster administrator has configured monitoring for the single model serving platform.
  • You have been assigned the monitoring-rules-view role.
  • You are familiar with how to monitor project metrics in the OpenShift Container Platform web console.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Switch to the Developer perspective.
  3. In the left menu, click Observe.
  4. As described in monitoring project metrics, use the web console to run queries for caikit_*, tgi_* or istio_* metrics.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.