Chapter 6. Distributed Inference with llm-d deployment with Helm

The Distributed Inference with llm-d Helm chart deploys a complete inference stack on Openshift Container Platform or managed Kubernetes. On Openshift Container Platform, the chart uses Operator Lifecycle Manager (OLM) to install and configure required Operators automatically. On managed Kubernetes, the chart installs all dependencies directly.

The Helm chart uses a three-tier deployment model:

Red Hat OpenShift AI Operator: The chart installs the Red Hat OpenShift AI Operator through an OLM Subscription. The chart controls the DataScienceCluster and DSCInitialization custom resources that manage the inference stack lifecycle.
Helm chart components: The Helm chart includes the rhaii profile, which provides an inference-focused deployment. For distributed inference, the primary component is KServe, which provides the LLMInferenceService custom resource (CR) for deploying and managing inference services.
Operator dependencies: Each component declares the Operators it requires. KServe depends on cert-manager, LeaderWorkerSet, and Red Hat Connectivity Link. The Helm chart resolves these dependencies, including transitive dependencies, and installs each Operator through OLM automatically.

Figure 6.1. Deploying Distributed Inference with llm-d with Helm charts

The Red Hat AI Inference (RHAII) Helm chart packages, deploys, and configures the Red Hat AI (RHAI) Operator and the RHAI cloud controller manager. The RHAI Operator handles KServe and model serving, while the RHAI cloud controller manager handles the underlying cluster infrastructure.