이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 3. Inference serving modelcar images with AI Inference Server in OpenShift Container Platform
Deploy a language model in a modelcar container with OpenShift Container Platform by configuring secrets, persistent storage, and a deployment custom resource (CR) that uses Red Hat AI Inference Server to inference serve the modelcar container image.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have logged in as a user with
cluster-adminprivileges. - You have installed NFD and the required GPU Operator for your underlying AI accelerator hardware.
- You have created a modelcar container image for the language model and pushed it to a container image registry.
Procedure
Create the Docker secret so that the cluster can download the Red Hat AI Inference Server image from the container registry. For example, to create a
SecretCR that contains the contents of your local~/.docker/config.jsonfile, run the following command:oc create secret generic docker-secret --from-file=.dockercfg=$HOME/.docker/config.json --type=kubernetes.io/dockercfg -n rhaiis-namespace
oc create secret generic docker-secret --from-file=.dockercfg=$HOME/.docker/config.json --type=kubernetes.io/dockercfg -n rhaiis-namespaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
PersistentVolumeClaim(PVC) custom resource (CR) and apply it in the cluster. The following examplePVCCR uses a default IBM VPC Block persistence volume.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteConfiguring cluster storage to meet your requirements is outside the scope of this procedure. For more detailed information, see Configuring persistent storage.
Create a
Deploymentcustom resource (CR) that pulls the modelcar image and deploys the Red Hat AI Inference Server container. Reference the following exampleDeploymentCR, which uses AI Inference Server to serve a modelcar image.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
spec.template.spec.volumes.persistentVolumeClaim.claimNamemust match the name of thePVCthat you created.- 2
- This example deployment uses a simple
initContainersconfiguration that runs before the main app container to download the required modelcar image. The model pull step is skipped if the model directory has already been populated, for example, from a previous deployment. - 3
- Update the value for
--served-model-nameto match the model that you are deploying. - 4
- The
/dev/shmvolume mount is required by the NVIDIA Collective Communications Library (NCCL). Tensor parallel vLLM deployments fail when the/dev/shmvolume mount is not set.
Increase the deployment replica count to the required number. For example, run the following command:
oc scale deployment granite -n rhaiis-namespace --replicas=1
oc scale deployment granite -n rhaiis-namespace --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Watch the deployment and ensure that it succeeds:
oc get deployment -n rhaiis-namespace --watch
$ oc get deployment -n rhaiis-namespace --watchCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY UP-TO-DATE AVAILABLE AGE rhaiis-oci-deploy 0/1 1 0 2s rhaiis-oci-deploy 1/1 1 1 14s
NAME READY UP-TO-DATE AVAILABLE AGE rhaiis-oci-deploy 0/1 1 0 2s rhaiis-oci-deploy 1/1 1 1 14sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
ServiceCR for the model inference. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Create a
RouteCR to enable public access to the model. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the URL for the exposed route. Run the following command:
oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'$ oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
rhaiis-oci-deploy-rhaiis-namespace.apps.example.com
rhaiis-oci-deploy-rhaiis-namespace.apps.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Ensure that the deployment is successful by querying the model. Run the following command:
curl -v -k http://rhaiis-oci-deploy-rhaiis-namespace.apps.modelsibm.ibmmodel.rh-ods.com/v1/chat/completions -H "Content-Type: application/json" -d '{
"model":"ibm-granite/granite-3.1-2b-instruct",
"messages":[{"role":"user","content":"Hello?"}],
"temperature":0.1
}'| jq
curl -v -k http://rhaiis-oci-deploy-rhaiis-namespace.apps.modelsibm.ibmmodel.rh-ods.com/v1/chat/completions -H "Content-Type: application/json" -d '{
"model":"ibm-granite/granite-3.1-2b-instruct",
"messages":[{"role":"user","content":"Hello?"}],
"temperature":0.1
}'| jq
Example output