Deploying Red Hat AI Inference Server in a disconnected environment
Deploy Red Hat AI Inference Server in a disconnected environment using OpenShift Container Platform and a disconnected mirror image registry
Abstract
Preface Copy linkLink copied to clipboard!
You can inference large language models with Red Hat AI Inference Server without any connection to the outside internet by installing OpenShift Container Platform and configuring a mirrored container image registry in the disconnected environment.
Currently, only NVIDIA CUDA AI accelerators are supported for OpenShift Container Platform in disconnected environments.
Chapter 1. Setting up a mirror registry for your disconnected environment Copy linkLink copied to clipboard!
To serve container images in a disconnected environment, you must configure a disconnected mirror registry on a bastion host. The bastion host acts as a secure gateway between your disconnected environment and the internet. You then mirror images from Red Hat’s online image registries, and serve them in the disconnected environment.
Prerequisites
- Deploy the bastion host.
-
Install
ocin the bastion host. - Install Podman in the bastion host.
- Install OpenShift Container Platform in the disconnected environment.
Procedure
- Open a shell prompt on the bastion host and create the disconnected mirror registry.
- Configure credentials that allow images to be mirrored.
Chapter 2. Mirroring the required images for model inference Copy linkLink copied to clipboard!
Once you have created a mirror registry for the disconnected environment, you are ready to mirror the required AI Inference Server image, AI accelerator Operator images, and language model image.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. - You have installed a mirror registry on the bastion host.
Procedure
Find the version of the following images that match your environment and pull the images with
podman:Create an image set configuration custom resource (CR) that includes the NFD Operator, NVIDIA GPU Operator, and AI Inference Server images that you pulled in the previous step. For example, save the following
ImageSetConfigurationCR as the fileimageset-config.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mirror the required images into the mirror registry. Run the following command:
oc mirror --config imageset-config.yaml docker://<target_mirror_registry_url> --registry-config <path_to_pull_secret_json>
$ oc mirror --config imageset-config.yaml docker://<target_mirror_registry_url> --registry-config <path_to_pull_secret_json>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, if you have already installed the NFD and NVIDIA GPU Operators in the cluster, create an
ImageSetConfigurationCR that configures AI Inference Server and model images only:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Mirror the image set in the disconnected environment.
- Configure the cluster for the mirror registry.
Chapter 3. Installing the Node Feature Discovery Operator Copy linkLink copied to clipboard!
Install the Node Feature Discovery Operator so that the cluster can use the AI accelerators that are available in the cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges.
Procedure
Create the
NamespaceCR for the Node Feature Discovery Operator:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroupCR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
SubscriptionCR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the Node Feature Discovery Operator deployment is successful by running the following command:
oc get pods -n openshift-nfd
$ oc get pods -n openshift-nfd
Example output
NAME READY STATUS RESTARTS AGE nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10m
NAME READY STATUS RESTARTS AGE
nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10m
Chapter 4. Installing the NVIDIA GPU Operator Copy linkLink copied to clipboard!
Install the NVIDIA GPU Operator to use the underlying NVIDIA CUDA AI accelerators that are available in the cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. - You have installed the Node Feature Discovery Operator.
Procedure
Create the
NamespaceCR for the NVIDIA GPU Operator:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroupCR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
SubscriptionCR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the NVIDIA GPU Operator deployment is successful by running the following command:
oc get pods -n nvidia-gpu-operator
$ oc get pods -n nvidia-gpu-operator
Example output
Chapter 5. Inference serving the model in the disconnected environment Copy linkLink copied to clipboard!
Use Red Hat AI Inference Server deployed in a disconnected OpenShift Container Platform environment to inference serve the language model from cluster persistent storage.
Prerequisites
- You have installed a mirror registry on the bastion host that is accessible to the disconnected cluster.
- You have added the model and Red Hat AI Inference Server images to the mirror registry.
- You have installed the Node Feature Discovery Operator and NVIDIA GPU Operator in the disconnected cluster.
Procedure
In the disconnected cluster, configure persistent storage using Network File System (NFS) and make the model available in the persistent storage that you configure.
NoteFor more information, see Persistent storage using NFS.
Create a
Deploymentcustom resource (CR). For example, the followingDeploymentCR uses AI Inference Server to serve a Granite model on a CUDA accelerator.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
ServiceCR for the model inference. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional. Create a
RouteCR to enable public access to the model. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the URL for the exposed route:
oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'$ oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
granite-rhaiis-namespace.apps.example.com
granite-rhaiis-namespace.apps.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow Query the model by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow