Chapter 5. Inference serving the model in the disconnected environment
Use Red Hat AI Inference Server deployed in a disconnected OpenShift Container Platform environment to inference serve the language model from cluster persistent storage.
Prerequisites
- You have installed a mirror registry on the bastion host that is accessible to the disconnected cluster.
- You have added the model and Red Hat AI Inference Server images to the mirror registry.
- You have installed the Node Feature Discovery Operator and NVIDIA GPU Operator in the disconnected cluster.
Procedure
In the disconnected cluster, configure persistent storage using Network File System (NFS) and make the model available in the persistent storage that you configure.
NoteFor more information, see Persistent storage using NFS.
Create a
Deploymentcustom resource (CR). For example, the followingDeploymentCR uses AI Inference Server to serve a Granite model on a CUDA accelerator.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
ServiceCR for the model inference. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional. Create a
RouteCR to enable public access to the model. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the URL for the exposed route:
oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'$ oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
granite-rhaiis-namespace.apps.example.com
granite-rhaiis-namespace.apps.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow Query the model by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow