第 5 章故障排除

Red Hat AI Inference Server 3.0 的以下故障排除信息描述了与模型加载、内存、模型响应质量、网络和 GPU 驱动程序相关的常见问题。在可用的情况下，描述了常见问题的临时解决方案。

vLLM 中的大多数常见问题与安装、型号加载、内存管理和 GPU 通信相关。大多数问题可以通过使用正确配置的环境来解决，确保兼容硬件和软件版本，并遵循推荐的配置实践。

重要

对于持久问题，请导出 VLLM_LOGGING_LEVEL=DEBUG 以启用调试日志，然后检查日志。

export VLLM_LOGGING_LEVEL=DEBUG

$ export VLLM_LOGGING_LEVEL=DEBUG

Copy to Clipboard

Toggle word wrap

5.1. 模型加载错误
复制链接

当您在没有指定用户命名空间的情况下运行 Red Hat AI Inference Server 容器镜像时，会返回一个未识别的模型错误。

podman run --rm -it \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
--shm-size=4GB -p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8

podman run --rm -it \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
--shm-size=4GB -p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
--env "HF_HUB_OFFLINE=0" \
--env=VLLM_NO_USAGE_STATS=1 \
-v ./rhaiis-cache:/opt/app-root/src/.cache \
registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.0.0 \
--model RedHatAI/Llama-3.2-1B-Instruct-FP8

Copy to Clipboard

Toggle word wrap

输出示例

ValueError: Unrecognized model in RedHatAI/Llama-3.2-1B-Instruct-FP8. Should have a model_type key in its config.json

ValueError: Unrecognized model in RedHatAI/Llama-3.2-1B-Instruct-FP8. Should have a model_type key in its config.json

Copy to Clipboard

Toggle word wrap

要解决这个问题，pass-- userns=keep-id:uid=1001 作为 Podman 参数，以确保容器使用 root 用户运行。

当 Red Hat AI Inference Server 下载模型时，下载会失败或卡住。为防止模型下载挂起，请先使用 huggingface-cli 下载模型。例如：
```
huggingface-cli download <MODEL_ID> --local-dir <DOWNLOAD_PATH>
```
```
$ huggingface-cli download <MODEL_ID> --local-dir <DOWNLOAD_PATH>
```
Copy to Clipboard Toggle word wrap
在提供模型时，将本地模型路径传递给 vLLM，以防止再次下载模型。
当 Red Hat AI Inference Server 从磁盘加载模型时，该过程有时会挂起。大型模型会消耗内存，如果内存运行较低，系统会因为它在 RAM 和磁盘之间进行交换数据而减慢。网络文件系统速度缓慢或缺少可用内存可能会触发过度交换。在集群节点之间共享文件系统的集群中会出现这种情况。
在可能的情况下，将模型保存在本地磁盘中，以防止在模型加载过程中出现缓慢的问题。确定系统有足够的 CPU 内存。
确定您的系统有足够的 CPU 容量来处理模型。

有时，Red Hat AI Inference Server 无法检查模型。日志中会报告错误。例如：

#...
  File "vllm/model_executor/models/registry.py", line xxx, in \_raise_for_unsupported
    raise ValueError(
ValueError: Model architectures [''] failed to be inspected. Please check the logs for more details.

#...
  File "vllm/model_executor/models/registry.py", line xxx, in \_raise_for_unsupported
    raise ValueError(
ValueError: Model architectures [''] failed to be inspected. Please check the logs for more details.

Copy to Clipboard

Toggle word wrap

当 vLLM 无法导入模型文件时，这个错误通常与 vLLM 构建中缺少依赖项或过时的二进制文件相关。

不支持一些模型架构。请参阅验证的模型列表。例如，以下错误表示您试图使用的模型不被支持：

Traceback (most recent call last):
#...
  File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
    for arch in architectures:
TypeError: 'NoneType' object is not iterable

Traceback (most recent call last):
#...
  File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
    for arch in architectures:
TypeError: 'NoneType' object is not iterable

Copy to Clipboard

Toggle word wrap

#...
  File "vllm/model_executor/models/registry.py", line xxx, in \_raise_for_unsupported
    raise ValueError(
ValueError: Model architectures [''] are not supported for now. Supported architectures:
#...

#...
  File "vllm/model_executor/models/registry.py", line xxx, in \_raise_for_unsupported
    raise ValueError(
ValueError: Model architectures [''] are not supported for now. Supported architectures:
#...

Copy to Clipboard

Toggle word wrap

注意

有些架构，如 DeepSeekV2VL 需要使用- hf_overrides 标志明确指定架构，例如：

--hf_overrides '{\"architectures\": [\"DeepseekVLV2ForCausalLM\"]}

--hf_overrides '{\"architectures\": [\"DeepseekVLV2ForCausalLM\"]}

Copy to Clipboard

Toggle word wrap

当您加载 8 位浮点(FP8)模型时，某些硬件有时会发生运行时错误。FP8 需要 GPU 硬件加速。当您加载 FP8 模型（如 deepseek-r1 或带有 F8_E4M3 10sor 类型的型号）时会出现错误。例如：

triton.compiler.errors.CompilationError: at 1:0:
def \_per_token_group_quant_fp8(
\^
ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
[rank0]:[W502 11:12:56.323757996 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

triton.compiler.errors.CompilationError: at 1:0:
def \_per_token_group_quant_fp8(
\^
ValueError("type fp8e4nv not supported in this architecture. The supported fp8 dtypes are ('fp8e4b15', 'fp8e5')")
[rank0]:[W502 11:12:56.323757996 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Copy to Clipboard

Toggle word wrap

注意

查看 Getting started 以确保支持您的特定加速器。目前支持的 FP8 模型支持的加速器包括：

有时，当提供与主机系统相关的运行时错误时。例如，您可能会在日志中看到错误，如下所示：

INFO 05-07 19:15:17 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=2048.
OMP: Error #179: Function Can't open SHM failed:
OMP: System error #0: Success
Traceback (most recent call last):
  File "/opt/app-root/bin/vllm", line 8, in <module>
    sys.exit(main())
..........................    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.

INFO 05-07 19:15:17 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=2048.
OMP: Error #179: Function Can't open SHM failed:
OMP: System error #0: Success
Traceback (most recent call last):
  File "/opt/app-root/bin/vllm", line 8, in <module>
    sys.exit(main())
..........................    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.

Copy to Clipboard

Toggle word wrap

您可以在启动 vllm 时传递 --shm-size=2g 参数来解决这个问题。

第 5 章故障排除

5.1. 模型加载错误
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

第 5 章 故障排除

5.1. 模型加载错误复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

第 5 章故障排除

5.1. 模型加载错误
复制链接