Chapter 6. Distributed Inference with llm-d deployment with Helm
The Distributed Inference with llm-d Helm chart deploys a complete inference stack on Openshift Container Platform or managed Kubernetes. On Openshift Container Platform, the chart uses Operator Lifecycle Manager (OLM) to install and configure required Operators automatically. On managed Kubernetes, the chart installs all dependencies directly.
The Helm chart uses a three-tier deployment model:
- Red Hat OpenShift AI Operator
-
The chart installs the Red Hat OpenShift AI Operator through an OLM Subscription. The chart controls the
DataScienceClusterandDSCInitializationcustom resources that manage the inference stack lifecycle. - Helm chart components
-
The Helm chart includes the
rhaiiprofile, which provides an inference-focused deployment. For distributed inference, the primary component is KServe, which provides theLLMInferenceServicecustom resource (CR) for deploying and managing inference services. - Operator dependencies
-
Each component declares the Operators it requires. KServe depends on cert-manager,
LeaderWorkerSet, and Red Hat Connectivity Link. The Helm chart resolves these dependencies, including transitive dependencies, and installs each Operator through OLM automatically.
Figure 6.1. Deploying Distributed Inference with llm-d with Helm charts
The Red Hat AI Inference (RHAII) Helm chart packages, deploys, and configures the Red Hat AI (RHAI) Operator and the RHAI cloud controller manager. The RHAI Operator handles KServe and model serving, while the RHAI cloud controller manager handles the underlying cluster infrastructure.
The RHAI Operator manages the KServe Controller and reconciles custom resource definitions (CRDs).
The RHAI cloud controller manager manages Helm-based infrastructure components, including cert-manager, Gateway API, Istio, and LeaderWorkerSet.
The RHAI cloud controller manager configures the managed Kubernetes or Openshift Container Platform cluster.