Chapter 6. Serving large language models


For serving large language models (LLMs), Red Hat OpenShift Data Science includes a model-serving stack that is based on the Compositional AI Toolkit (Caikit), Text Generation Inference Service (TGIS), and KServe components. The stack helps you to deploy, monitor, scale, and maintain LLMs.

Important

The Caikit-TGIS-KServe model-serving stack is a Limited Availability feature. Limited Availability means that you can install and receive support for the feature only with specific approval from Red Hat. Without such approval, the feature is unsupported.

You must use version 2 (that is, version 2.1 or 2.2) of the Red Hat OpenShift Data Science Operator to install the Caikit-TGIS-KServe stack. Version 2 of the Red Hat OpenShift Data Science Operator is also a Limited Availability feature.

The Caikit-TGIS-KServe model-serving stack consists of the following components:

  • Caikit: Wrapper layer that handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types.
  • Caikit-nlp: Caikit module that handles natural language processing (NLP)-style models.
  • Text Generation Inference Server (TGIS): Model-serving backend that loads models and provides an inference engine.
  • KServe: Kubernetes custom resource definition (CRD) that orchestrates model serving for all types of models. It includes serving runtimes that implement the loading of given types of model servers. KServe handles the lifecycle of the deployment object, storage access, and networking setup.
  • Red Hat OpenShift Serverless: Cloud-native development model that allows for serverless deployments of data models. OpenShift Serverless is based on the open source Knative project.
  • Red Hat OpenShift Service Mesh: Service mesh networking layer that manages traffic flows and enforces access policies. OpenShift Service Mesh is based on the open source Istio project.

6.1. Overview of required Operators for the Caikit-TGIS-KServe stack

In addition to version 2 of the Red Hat OpenShift Data Science Operator, the Caikit-TGIS-KServe stack requires installation of the following Operators:

  • Red Hat OpenShift distributed tracing platform (Jaeger)
  • Kiali Operator provided by Red Hat
  • Red Hat OpenShift Serverless
  • Red Hat OpenShift Service Mesh

The installation instructions referenced in the following section describe how to install each of these required Operators.

6.2. Installing the Caikit-TGIS-KServe stack

To install the Caikit-TGIS-KServe stack, see the step-by-step instructions in the caikit-tgis-serving repository.

Note

The step-by-step instructions use OpenShift manifest files that are also in the caikit-tgis-serving repository. These YAML-based manifest files are not part of Red Hat OpenShift Data Science, but are a suitable reference to use for the installation.

Note

Step 11 of the step-by-step instructions shows how to deploy KServe using the Open Data Hub Operator. For equivalent instructions that use version 2 of the Red Hat OpenShift Data Science Operator, see Installing Red Hat OpenShift Data Science components by using the CLI.

6.3. Additional resources

  • To learn how to access metrics provided by the Caikit-TGIS-KServe model-serving stack, see Accessing metrics.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.