在断开连接的环境中部署 Red Hat AI Inference Server


Red Hat AI Inference Server 3.1

使用 OpenShift Container Platform 和断开连接的镜像 registry 在断开连接的环境中安装 Red Hat AI Inference Server。

Red Hat AI Documentation Team

摘要

了解如何在断开连接的环境中使用 Red Hat AI Inference Server for model serving andferencing。

前言

您可以通过安装 OpenShift Container Platform 并配置在断开连接的环境中的镜像容器镜像 registry,使用 Red Hat AI Inference Server 在没有连接到外部互联网的情况下,使用 Red Hat AI Inference Server 进行大量语言模型。

重要

目前,OpenShift Container Platform 上的断开连接的环境只支持 NVIDIA 加速器。

第 1 章 为断开连接的环境设置镜像 registry

要在断开连接的环境中提供容器镜像,您必须在堡垒主机上配置断开连接的镜像 registry。堡垒主机充当您的断开连接的环境和互联网之间的安全网关。然后,您从红帽在线镜像 registry 中镜像镜像,并在断开连接的环境中提供它们。

先决条件

流程

  1. 在堡垒主机上打开 shell 提示符,并创建断开连接的镜像 registry
  2. 配置允许镜像镜像的凭证

为断开连接的环境创建镜像 registry 后,就可以镜像 AI Inference Server 容器镜像。

先决条件

  • 已安装 OpenShift CLI(oc)。
  • 您已以具有 cluster-admin 权限的用户身份登录。
  • 您已在堡垒主机上安装了镜像 registry

流程

  1. 查找与环境匹配的以下镜像版本,并使用 podman 拉取镜像:

  2. 创建包含 NFD Operator、Nvidia GPU Operator 和 AI Inference Server 镜像的镜像设置配置自定义资源(CR)。例如,以下 ImageSetConfiguration CR 包含 AMD CPU 架构和 CUDA 加速器的镜像规格:

    apiVersion: mirror.openshift.io/v2alpha1
    kind: ImageSetConfiguration
    mirror:
      operators:
      # Node Feature Discovery (NFD) Operator
      # Helps Openshift detect hardware capabilities like GPUs
      - catalog: registry.redhat.io/openshift4/ose-cluster-nfd-operator:latest
        packages:
          - name: nfd
            defaultChannel: stable
            channels:
              - name: stable
    
      # GPU Operator
      # Manages NVIDIA GPUs on OpenShift
      - catalog: registry.connect.redhat.com/nvidia/gpu-operator-bundle:latest
        packages:
          - name: gpu-operator-certified
            defaultChannel: stable
            channels:
              - name: stable
      additionalImages:
        # Red Hat AI Inference Server image
      - name: registry.redhat.io/rhaiis/vllm-cuda-rhel9:latest
    Copy to Clipboard Toggle word wrap
  3. 另外,如果您在集群中安装了 NFD 和 NVIDIA GPU Operator,请创建一个只配置 AI Inference Server 的 ImageSetConfiguration CR:

    apiVersion: mirror.openshift.io/v2alpha1
    kind: ImageSetConfiguration
    mirror:
      additionalImages:
      - name: registry.redhat.io/rhaiis/vllm-cuda-rhel9:latest
    Copy to Clipboard Toggle word wrap
  4. 在断开连接的环境中镜像设置 的镜像。
  5. 将断开连接的集群配置为使用更新的镜像集

安装 Node Feature Discovery Operator 和 NVIDIA GPU Operator,供您使用底层主机 AI 加速器。

先决条件

  • 已安装 OpenShift CLI(oc)。
  • 您已以具有 cluster-admin 权限的用户身份登录。
  • 您已成功在断开连接的环境中镜像所需的 Operator 镜像。

流程

  1. 禁用默认的 OperatorHub 源。运行以下命令:

    $ oc patch OperatorHub cluster --type json \
        -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
    Copy to Clipboard Toggle word wrap
  2. 为 Node Feature Discovery Operator 和 NVIDIA GPU Operator 应用 NamespaceOperatorGroupSubscription CR。

    1. 创建 Namespace CR:

      oc apply -f - <<EOF
      apiVersion: v1
      kind: Namespace
      metadata:
        name: nvidia-gpu-operator
      ---
      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-nfd
        labels:
          name: openshift-nfd
          openshift.io/cluster-monitoring: "true"
      EOF
      Copy to Clipboard Toggle word wrap
    2. 创建 OperatorGroup CR:

      oc apply -f - <<EOF
      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: gpu-operator-certified
        namespace: nvidia-gpu-operator
      spec:
       targetNamespaces:
       - nvidia-gpu-operator
      ---
      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        generateName: openshift-nfd-
        name: openshift-nfd
        namespace: openshift-nfd
      spec:
        targetNamespaces:
        - openshift-nfd
      EOF
      Copy to Clipboard Toggle word wrap
    3. 创建 Subscription CR:

      oc apply -f - <<EOF
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: gpu-operator-certified
        namespace: nvidia-gpu-operator
      spec:
        channel: "stable"
        installPlanApproval: Manual
        name: gpu-operator-certified
        source: certified-operators
        sourceNamespace: openshift-marketplace
      ---
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: nfd
        namespace: openshift-nfd
      spec:
        channel: "stable"
        installPlanApproval: Automatic
        name: nfd
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      EOF
      Copy to Clipboard Toggle word wrap
  3. 为 Hugging Face 令牌创建 Secret 自定义资源(CR)。

    1. 使用您在 Hugging Face 中设置的令牌,设置 HF_TOKEN 变量。

      $ HF_TOKEN=<your_huggingface_token>
      Copy to Clipboard Toggle word wrap
    2. 将集群命名空间设置为与部署 Red Hat AI Inference Server 镜像的位置匹配,例如:

      $ NAMESPACE=rhaiis-namespace
      Copy to Clipboard Toggle word wrap
    3. 在集群中创建 Secret CR:

      $ oc create secret generic hf-secret --from-literal=HF_TOKEN=$HF_TOKEN -n $NAMESPACE
      Copy to Clipboard Toggle word wrap

验证

运行以下命令验证 Operator 部署是否成功:

$ oc get pods
Copy to Clipboard Toggle word wrap

输出示例

NAME                                                  READY   STATUS     RESTARTS   AGE
nfd-controller-manager-7f86ccfb58-vgr4x               2/2     Running    0          10m
gpu-feature-discovery-c2rfm                           1/1     Running    0          6m28s
gpu-operator-84b7f5bcb9-vqds7                         1/1     Running    0          39m
nvidia-container-toolkit-daemonset-pgcrf              1/1     Running    0          6m28s
nvidia-cuda-validator-p8gv2                           0/1     Completed  0          99s
nvidia-dcgm-exporter-kv6k8                            1/1     Running    0          6m28s
nvidia-dcgm-tpsps                                     1/1     Running    0          6m28s
nvidia-device-plugin-daemonset-gbn55                  1/1     Running    0          6m28s
nvidia-device-plugin-validator-z7ltr                  0/1     Completed  0          82s
nvidia-driver-daemonset-410.84.202203290245-0-xxgdv   2/2     Running    0          6m28s
nvidia-node-status-exporter-snmsm                     1/1     Running    0          6m28s
nvidia-operator-validator-6pfk6                       1/1     Running    0          6m28s
...
Copy to Clipboard Toggle word wrap

第 4 章 配置持久性存储和推断模型

您应该为 AI Inference Server 配置持久性存储,以便在对模型产生影响前存储模型镜像。

注意

配置持久性存储是一个可选但推荐的步骤。

先决条件

  • 您已在堡垒主机上安装了镜像 registry。
  • 您已在断开连接的集群中安装 Node Feature Discovery Operator 和 NVIDIA GPU Operator。

流程

  1. 在断开连接的 OpenShift Container Platform 集群中,使用网络文件系统(NFS)配置持久性存储
  2. 创建 Deployment 自定义资源(CR)。例如,以下 Deployment CR 使用 AI Inference Server 在 CUDA 加速器上提供 Granite 模型。

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: granite
      namespace: rhaiis-namespace 
    1
    
      labels:
        app: granite
    spec:
      replicas: 0
      selector:
        matchLabels:
          app: granite
      template:
        metadata:
          labels:
            app: granite
        spec:
          containers:
            - name: granite
              image: 'registry.redhat.io/rhaiis/vllm-cuda-rhel9@sha256:137ac606b87679c90658985ef1fc9a26a97bb11f622b988fe5125f33e6f35d78'
              imagePullPolicy: IfNotPresent
              command:
                - python
                - '-m'
                - vllm.entrypoints.openai.api_server
              args:
                - '--port=8000'
                - '--model=/mnt/models'
                - '--served-model-name=granite-3.1-2b-instruct-quantized.w8a8'
                - '--tensor-parallel-size=1'
              resources:
                limits:
                  cpu: '10'
                  nvidia.com/gpu: '1'
                requests:
                  cpu: '2'
                  memory: 6Gi
                  nvidia.com/gpu: '1'
              volumeMounts:
                - name: cache-volume
                  mountPath: /mnt/models
                - name: shm
                  mountPath: /dev/shm 
    2
    
          volumes:
            - name: cache-volume
              persistentVolumeClaim:
                claimName: granite-31-w8a8
            - name: shm
              emptyDir:
                medium: Memory
                sizeLimit: 2Gi
          restartPolicy: Always
    Copy to Clipboard Toggle word wrap
    1
    metadata.namespace 值必须与配置 Hugging Face Secret CR 的命名空间匹配。
    2
    NVIDIA Collective Communications 库(NCCL)需要 /dev/shm 卷挂载。当 /dev/shm 卷挂载没有设置时,Tensor parallel vLLM 部署会失败。
  3. 为模型推测创建 Service CR。例如:

    apiVersion: v1
    kind: Service
    metadata:
      name: granite
      namespace: rhaiis-namespace
    spec:
      selector:
        app: granite
      ports:
        - protocol: TCP
          port: 80
          targetPort: 8000
    Copy to Clipboard Toggle word wrap
  4. 可选。创建 Route CR 以启用对模型的公共访问。例如:

    apiVersion: route.openshift.io/v1
    kind: Route
    metadata:
      name: granite
      namespace: rhaiis-namespace
    spec:
      to:
        kind: Service
        name: granite
      port:
        targetPort: 80
    Copy to Clipboard Toggle word wrap
  5. 获取公开路由的 URL:

    $ oc get route granite -n rhaiis-namespace -o jsonpath='{.spec.host}'
    Copy to Clipboard Toggle word wrap

    输出示例

    granite-rhaiis-namespace.apps.example.com
    Copy to Clipboard Toggle word wrap

  6. 运行以下命令来查询模型:

    curl -X POST http://granite-rhaiis-namespace.apps.example.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "granite-3.1-2b-instruct-quantized.w8a8",
        "messages": [{"role": "user", "content": "What is AI?"}],
        "temperature": 0.1
      }'
    Copy to Clipboard Toggle word wrap

法律通告

Copyright © 2025 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat