开始使用

Red Hat AI Inference Server 3.0

Red Hat AI Inference Server 入门

Red Hat AI Documentation Team

法律通告

摘要

了解如何使用红帽 AI Inference Server for model serving and inferencing。

前言
复制链接

Red Hat AI Inference Server 是一个容器镜像，它通过 LLMs 优化服务和推断。使用 AI Inference Server，您可以以增强其性能并降低成本的方式提供和推测模型。

第 1 章关于 AI Inference Server
复制链接

AI Inference 服务器提供企业级稳定性和安全性，基于上游开源软件进行构建。AI Inference 服务器利用上游 vLLM 项目，它提供第一流推测功能。

例如，AI Inference 服务器使用持续批处理来处理请求，而不必等待完整批处理被累积。它还使用 10sor parallelism 在多个 GPU 之间分发 LLM 工作负载。这些功能提高了延迟和更高的吞吐量。

要降低推断模型的成本，AI Inference 服务器使用页面关注。LLMs 使用一种称为注意的机制来理解与用户对话。通常，请注意，使用大量内存。通过为 LLM 置备内存，如虚拟内存可用于操作系统的方式，页面对此内存的注意。这个方法消耗较少的内存，这会降低成本。

要验证 AI Inference Server 节约成本和性能提高，请完成以下步骤：

使用 AI Inference Server 提供和推断
使用关键指标验证 Red Hat AI Inference 服务器的好处

第 2 章产品和版本兼容性
复制链接

下表列出了 Red Hat AI Inference Server 3.0 支持的产品版本。

Expand

表 2.1. 产品和版本兼容性
产品	支持的版本
Red Hat AI Inference Server	3.0
vLLM core	0.8.4
LLM Compressor	0.5.1 技术预览

第 3 章使用 AI Inference Server 提供和推断
复制链接

使用红帽 AI Inference Server 提供大型语言模型和推测。

先决条件

已安装 Podman 或 Docker

您可以使用 NVIDIA 或 AMD GPU 访问 Linux 服务器，并以具有 root 权限的用户身份登录

对于 NVIDIA GPU：
- 安装 NVIDIA 驱动程序
- 安装 NVIDIA Container Toolkit
- 如果您的系统有多个使用 NVswitch 的 NVIDIA GPU，则必须具有启动 Fabric Manager 的 root 访问权限

对于 AMD GPU：

安装 ROCm 软件
验证您是否可以运行 ROCm 容器
- 您可以访问 registry.redhat.io 并已登录
- 您有一个 Hugging Face 帐户，并生成了一个 Hugging Face 令牌

注意

AMD GPU 仅支持 FP8 (W8A8)和 GGUF 量化方案。如需更多信息，请参阅支持的硬件。

流程

使用下表，识别您的基础架构的正确镜像。

Expand

GPU	AI Inference Server 镜像
NVIDIA CUDA (T4, A100, L4, L40S, H100, H200)	`registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0`
AMD ROCm (MI210, MI300X)	`registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.0.0`

在服务器主机上打开一个终端，并登录到 registry.redhat.io ：
```
podman login registry.redhat.io
```
```
$ podman login registry.redhat.io
```
Copy to Clipboard Toggle word wrap

为您的 GPU 拉取相关镜像：

podman pull registry.redhat.io/rhaiis/vllm-<gpu_type>-rhel9:3.0.0

$ podman pull registry.redhat.io/rhaiis/vllm-<gpu_type>-rhel9:3.0.0

Copy to Clipboard

Toggle word wrap

如果您的系统启用了 SELinux，请将 SELinux 配置为允许设备访问：
```
sudo setsebool -P container_use_devices 1
```
```
$ sudo setsebool -P container_use_devices 1
```
Copy to Clipboard Toggle word wrap
创建卷并将其挂载到容器中。调整容器权限，以便容器可以使用它。
```
mkdir -p rhaiis-cache
```
```
$ mkdir -p rhaiis-cache
```
Copy to Clipboard Toggle word wrap
```
chmod g+rwX rhaiis-cache
```
```
$ chmod g+rwX rhaiis-cache
```
Copy to Clipboard Toggle word wrap
创建或附加 HF_TOKEN Hugging Face 令牌到 private.env 文件。提供 private.env 文件。
```
echo "export HF_TOKEN=<your_HF_token>" > private.env
```
```
$ echo "export HF_TOKEN=<your_HF_token>" > private.env
```
Copy to Clipboard Toggle word wrap
```
source private.env
```
```
$ source private.env
```
Copy to Clipboard Toggle word wrap

启动 AI Inference Server 容器镜像。

对于 NVIDIA CUDA 加速器：

如果主机系统有多个 GPU 并使用 NVSwitch，则启动 NVIDIA Fabric Manager。要检测您的系统是否使用 NVSwitch，请首先检查 /proc/driver/nvidia-nvswitch/devices/ 中是否存在文件，然后启动 NVIDIA Fabric Manager。启动 NVIDIA Fabric Manager 需要 root 特权。
```
ls /proc/driver/nvidia-nvswitch/devices/
```
```
$ ls /proc/driver/nvidia-nvswitch/devices/
```
Copy to Clipboard Toggle word wrap
输出示例
```
0000:0c:09.0  0000:0c:0a.0  0000:0c:0b.0  0000:0c:0c.0  0000:0c:0d.0  0000:0c:0e.0
```
```
0000:0c:09.0  0000:0c:0a.0  0000:0c:0b.0  0000:0c:0c.0  0000:0c:0d.0  0000:0c:0e.0
```
Copy to Clipboard Toggle word wrap
```
systemctl start nvidia-fabricmanager
```
```
$ systemctl start nvidia-fabricmanager
```
Copy to Clipboard Toggle word wrap
重要
只有在使用多个 GPU 的系统上需要使用 NVswitch 的系统才需要 NVIDIA Fabric Manager。如需更多信息，请参阅 NVIDIA 服务器架构。

运行以下命令，检查 Red Hat AI Inference Server 容器是否可以访问主机上的 NVIDIA GPU：

podman run --rm -it \
--security-opt=label=disable \
--device nvidia.com/gpu=all \
nvcr.io/nvidia/cuda:12.4.1-base-ubi9 \
nvidia-smi

$ podman run --rm -it \
--security-opt=label=disable \
--device nvidia.com/gpu=all \
nvcr.io/nvidia/cuda:12.4.1-base-ubi9 \
nvidia-smi

Copy to Clipboard

Toggle word wrap

输出示例

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:08:01.0 Off |                    0 |
| N/A   32C    P0             64W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          Off |   00000000:08:02.0 Off |                    0 |
| N/A   29C    P0             63W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:08:01.0 Off |                    0 |
| N/A   32C    P0             64W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          Off |   00000000:08:02.0 Off |                    0 |
| N/A   29C    P0             63W /  400W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Copy to Clipboard

Toggle word wrap

启动容器。

podman run --rm -it \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
--shm-size=4g -p 8000:8000 \
--userns=keep-id:uid=1001 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache:Z \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8 \
--tensor-parallel-size 2

$ podman run --rm -it \
--device nvidia.com/gpu=all \
--security-opt=label=disable \


--shm-size=4g -p 8000:8000 \


--userns=keep-id:uid=1001 \


--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \


--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache:Z \


registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8 \
--tensor-parallel-size 2

Copy to Clipboard

Toggle word wrap

1: 启用 SELinux 的系统需要。--security-opt=label=disable 防止 SELinux 重新标记卷挂载中的文件。如果您选择不使用此参数，您的容器可能无法成功运行。
2: 如果您在共享内存时遇到问题，请将 -shm-size 增加到 8GB。
3: 将主机 UID 映射到容器中 vLLM 进程的有效 UID。您也可以 pass-- user=0，但这比 the- userns 选项不太安全。set-- user=0 在容器内以 root 身份运行 vLLM。
4: 使用 Hugging Face API 访问令牌设置和导出 HF_TOKEN
5: 启用 SELinux 的系统需要。在 Debian 或 Ubuntu 操作系统上，或者在不使用 SELinux 的情况下使用 Docker 时，:Z 后缀不可用。
6: 在多个 GPU 上运行 AI Inference Server 容器时，set -tensor-parallel-size 与 GPU 的数量匹配。

对于 AMD ROCm 加速器：

使用 amd-smi static -a 验证容器是否可以访问主机系统 GPU：

podman run -ti --rm --pull=newer \
--security-opt=label=disable \
--device=/dev/kfd --device=/dev/dri \
--group-add keep-groups \
--entrypoint="" \
registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.0.0 \
amd-smi static -a

$ podman run -ti --rm --pull=newer \
--security-opt=label=disable \
--device=/dev/kfd --device=/dev/dri \
--group-add keep-groups \


--entrypoint="" \
registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.0.0 \
amd-smi static -a

Copy to Clipboard

Toggle word wrap

1: 您必须属于 AMD 系统上的视频和呈现组才能使用 GPU。要访问 GPU，您必须将 --group-add=keep-groups 补充组选项传给容器。

启动容器：

podman run --rm -it \
--device /dev/kfd --device /dev/dri \
--security-opt=label=disable \ 
--group-add keep-groups \
--shm-size=4GB -p 8000:8000 \ 
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8 \
--tensor-parallel-size 2

podman run --rm -it \
--device /dev/kfd --device /dev/dri \
--security-opt=label=disable \


--group-add keep-groups \
--shm-size=4GB -p 8000:8000 \


--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8 \
--tensor-parallel-size 2

Copy to Clipboard

Toggle word wrap

1: --security-opt=label=disable 防止 SELinux 重新标记卷挂载中的文件。如果您选择不使用此参数，您的容器可能无法成功运行。
2: 如果您在共享内存时遇到问题，请将 -shm-size 增加到 8GB。
3: 在多个 GPU 上运行 AI Inference Server 容器时，set -tensor-parallel-size 与 GPU 的数量匹配。

在终端中的单独标签页中，使用 API 向模型发出请求。

curl -X POST -H "Content-Type: application/json" -d '{
    "prompt": "What is the capital of France?",
    "max_tokens": 50
}' http://<your_server_ip>:8000/v1/completions | jq

curl -X POST -H "Content-Type: application/json" -d '{
    "prompt": "What is the capital of France?",
    "max_tokens": 50
}' http://<your_server_ip>:8000/v1/completions | jq

Copy to Clipboard

Toggle word wrap

输出示例

{
    "id": "cmpl-b84aeda1d5a4485c9cb9ed4a13072fca",
    "object": "text_completion",
    "created": 1746555421,
    "model": "RedHatAI/Llama-3.2-1B-Instruct-FP8",
    "choices": [
        {
            "index": 0,
            "text": " Paris.\nThe capital of France is Paris.",
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 8,
        "total_tokens": 18,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    }
}

{
    "id": "cmpl-b84aeda1d5a4485c9cb9ed4a13072fca",
    "object": "text_completion",
    "created": 1746555421,
    "model": "RedHatAI/Llama-3.2-1B-Instruct-FP8",
    "choices": [
        {
            "index": 0,
            "text": " Paris.\nThe capital of France is Paris.",
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null,
            "prompt_logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 8,
        "total_tokens": 18,
        "completion_tokens": 10,
        "prompt_tokens_details": null
    }
}

Copy to Clipboard

Toggle word wrap

第 4 章使用关键指标验证 Red Hat AI Inference 服务器的好处
复制链接

使用以下指标评估 AI Inference Server 提供的 LLM 模型的性能：

第一次令牌(TTFT) ：模型提供其响应的第一个令牌所需的时间？
各个输出令牌(TPOT) 的时间：模型需要多久才能向已发送请求的每个用户提供输出令牌？
延迟：模型生成完整响应所需的时间？
吞吐量 ：在所有用户和请求中，模型可以同时生成多少个输出令牌？

完成以下步骤，运行一个基准测试，其中显示了 AI Inference Server 和其他 inference 服务器如何根据这些指标执行。

先决条件

AI Inference Server 容器镜像
GitHub 帐户
Python 3.9 或更高版本

流程

在您的主机系统上，启动一个 AI Inference Server 容器并提供模型。

podman run --rm -it --device nvidia.com/gpu=all \
--shm-size=4GB -p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
--security-opt=label=disable \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8

$ podman run --rm -it --device nvidia.com/gpu=all \
--shm-size=4GB -p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
--security-opt=label=disable \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8

Copy to Clipboard

Toggle word wrap

在一个单独的终端选项卡中，安装基准工具依赖项。
```
pip install vllm pandas datasets
```
```
$ pip install vllm pandas datasets
```
Copy to Clipboard Toggle word wrap

克隆 vLLM Git 存储库：

git clone https://github.com/vllm-project/vllm.git

$ git clone https://github.com/vllm-project/vllm.git

Copy to Clipboard

Toggle word wrap

运行 ./vllm/benchmarks/benchmark_serving.py 脚本。

python vllm/benchmarks/benchmark_serving.py --backend vllm --model RedHatAI/Llama-3.2-1B-Instruct-FP8 --num-prompts 100 --dataset-name random  --random-input 1024 --random-output 512 --port 8000

$ python vllm/benchmarks/benchmark_serving.py --backend vllm --model RedHatAI/Llama-3.2-1B-Instruct-FP8 --num-prompts 100 --dataset-name random  --random-input 1024 --random-output 512 --port 8000

Copy to Clipboard

Toggle word wrap

验证

结果显示 AI Inference 服务器如何根据密钥服务器指标执行：

============ Serving Benchmark Result ============
Successful requests:                    100
Benchmark duration (s):                 4.61
Total input tokens:                     102300
Total generated tokens:                 40493
Request throughput (req/s):             21.67
Output token throughput (tok/s):        8775.85
Total Token throughput (tok/s):         30946.83
---------------Time to First Token----------------
Mean TTFT (ms):                         193.61
Median TTFT (ms):                       193.82
P99 TTFT (ms):                          303.90
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                         9.06
Median TPOT (ms):                       8.57
P99 TPOT (ms):                          13.57
---------------Inter-token Latency----------------
Mean ITL (ms):                          8.54
Median ITL (ms):                        8.49
P99 ITL (ms):                           13.14
==================================================

============ Serving Benchmark Result ============
Successful requests:                    100
Benchmark duration (s):                 4.61
Total input tokens:                     102300
Total generated tokens:                 40493
Request throughput (req/s):             21.67
Output token throughput (tok/s):        8775.85
Total Token throughput (tok/s):         30946.83
---------------Time to First Token----------------
Mean TTFT (ms):                         193.61
Median TTFT (ms):                       193.82
P99 TTFT (ms):                          303.90
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                         9.06
Median TPOT (ms):                       8.57
P99 TPOT (ms):                          13.57
---------------Inter-token Latency----------------
Mean ITL (ms):                          8.54
Median ITL (ms):                        8.49
P99 ITL (ms):                           13.14
==================================================

Copy to Clipboard

Toggle word wrap

尝试更改此基准的参数，然后再次运行它。注意如何将 vllm 作为后端与其他选项进行比较。吞吐量应该始终更高，但延迟应该较低。

其它选项是： tgi,lmdeploy,deepspeed-mii,openai, 和 openai-chat
--dataset-name 的其它选项有： sharegpt,burstgpt,sonnet,random,hf

其他资源

vLLM 文档
LLM Inference Performance Engineering: 最佳实践, Mosaic AI research，它解释了吞吐量和延迟等指标

法律通告
复制链接

The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

开始使用

Red Hat AI Inference Server 入门

前言
复制链接

第 1 章关于 AI Inference Server
复制链接

第 2 章产品和版本兼容性
复制链接

第 3 章使用 AI Inference Server 提供和推断
复制链接

第 4 章使用关键指标验证 Red Hat AI Inference 服务器的好处
复制链接

第 5 章故障排除
复制链接

5.1. 模型加载错误
复制链接

5.2. 内存优化
复制链接

5.3. 生成的模型响应质量
复制链接

5.4. CUDA 加速器错误
复制链接

5.5. 网络错误
复制链接

5.6. Python 多处理错误
复制链接

5.7. GPU 驱动程序或设备直通问题
复制链接

法律通告
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

开始使用

Red Hat AI Inference Server 入门

前言复制链接链接已复制到粘贴板!

第 1 章 关于 AI Inference Server复制链接链接已复制到粘贴板!

第 2 章 产品和版本兼容性复制链接链接已复制到粘贴板!

第 3 章 使用 AI Inference Server 提供和推断复制链接链接已复制到粘贴板!

第 4 章 使用关键指标验证 Red Hat AI Inference 服务器的好处复制链接链接已复制到粘贴板!

第 5 章 故障排除复制链接链接已复制到粘贴板!

5.1. 模型加载错误复制链接链接已复制到粘贴板!

5.2. 内存优化复制链接链接已复制到粘贴板!

5.3. 生成的模型响应质量复制链接链接已复制到粘贴板!

5.4. CUDA 加速器错误复制链接链接已复制到粘贴板!

5.5. 网络错误复制链接链接已复制到粘贴板!

5.6. Python 多处理错误复制链接链接已复制到粘贴板!

5.7. GPU 驱动程序或设备直通问题复制链接链接已复制到粘贴板!

法律通告复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

前言
复制链接

第 1 章关于 AI Inference Server
复制链接

第 2 章产品和版本兼容性
复制链接

第 3 章使用 AI Inference Server 提供和推断
复制链接

第 4 章使用关键指标验证 Red Hat AI Inference 服务器的好处
复制链接

第 5 章故障排除
复制链接

5.1. 模型加载错误
复制链接

5.2. 内存优化
复制链接

5.3. 生成的模型响应质量
复制链接

5.4. CUDA 加速器错误
复制链接

5.5. 网络错误
复制链接

5.6. Python 多处理错误
复制链接

5.7. GPU 驱动程序或设备直通问题
复制链接

法律通告
复制链接