이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 4. Configuring persistent storage and inferencing the model
You should configure persistent storage for AI Inference Server to store the model images before you inference the model.
Configuring persistent storage is an optional but recommended step.
Prerequisites
- You have installed a mirror registry on the bastion host.
- You have installed the Node Feature Discovery Operator and NVIDIA GPU Operator in the disconnected cluster.
Procedure
- In the disconnected OpenShift Container Platform cluster, configure persistent storage using Network File System (NFS).
Create a
Deploymentcustom resource (CR). For example, the followingDeploymentCR uses AI Inference Server to serve a Granite model on a CUDA accelerator.apiVersion: apps/v1 kind: Deployment metadata: name: granite namespace: rhaiis-namespace1 labels: app: granite spec: replicas: 0 selector: matchLabels: app: granite template: metadata: labels: app: granite spec: containers: - name: granite image: 'registry.redhat.io/rhaiis/vllm-cuda-rhel9@sha256:137ac606b87679c90658985ef1fc9a26a97bb11f622b988fe5125f33e6f35d78' imagePullPolicy: IfNotPresent command: - python - '-m' - vllm.entrypoints.openai.api_server args: - '--port=8000' - '--model=/mnt/models' - '--served-model-name=granite-3.1-2b-instruct-quantized.w8a8' - '--tensor-parallel-size=1' resources: limits: cpu: '10' nvidia.com/gpu: '1' requests: cpu: '2' memory: 6Gi nvidia.com/gpu: '1' volumeMounts: - name: cache-volume mountPath: /mnt/models - name: shm mountPath: /dev/shm2 volumes: - name: cache-volume persistentVolumeClaim: claimName: granite-31-w8a8 - name: shm emptyDir: medium: Memory sizeLimit: 2Gi restartPolicy: AlwaysCreate a
ServiceCR for the model inference. For example:apiVersion: v1 kind: Service metadata: name: granite namespace: rhaiis-namespace spec: selector: app: granite ports: - protocol: TCP port: 80 targetPort: 8000Optional. Create a
RouteCR to enable public access to the model. For example:apiVersion: route.openshift.io/v1 kind: Route metadata: name: granite namespace: rhaiis-namespace spec: to: kind: Service name: granite port: targetPort: 80Get the URL for the exposed route:
$ oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'Example output
granite-rhaiis-namespace.apps.example.comQuery the model by running the following command:
curl -X POST http://granite-rhaiis-namespace.apps.example.com/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "granite-3.1-2b-instruct-quantized.w8a8", "messages": [{"role": "user", "content": "What is AI?"}], "temperature": 0.1 }'