10.8. IBM Power 问题故障排除

如果您无法访问 AI Inference Server 容器中的模型数据，请完成以下步骤：

验证映射到容器的 /models 文件夹是否正确
查看主机 SELinux 设置
确保您已在 $HOME/models 文件夹上应用适当的权限，例如：
```
chmod -R 755 $HOME/models
```
```
$ chmod -R 755 $HOME/models
```
Copy to Clipboard Toggle word wrap

确保为 Podman 卷挂载使用 :Z 选项：

podman run -d --device=/dev/vfio \
     -v $HOME/models:/models:Z \
     # ...

$ podman run -d --device=/dev/vfio \
     -v $HOME/models:/models:Z \
     # ...

Copy to Clipboard

Toggle word wrap

确保为解码模型设置了 VLLM_SPYRE_USE_CB=1。

10.8.1. 用于 Power AI acclerator 卡问题的 IBM Spyre
复制链接

确保主机上可以看到 IBM Spyre AI 加速器卡。使用 lspci 验证卡是否可用。
确保您的用户位于 sentient 组中。
使用服务报告工具诊断和纠正卡访问问题。请参阅 IBM Power 系统服务及生产力工具。

10.8.2. IBM Spyre 用于 Power 性能问题
复制链接

确保所有 Spyre 卡在 IBM Power 服务器 I/O drawer 的前四个插槽中安全隔离。前四个插槽具有最高的速度 PCIe 接口。
确保分配给 LPAR 的卡都在同一个 drawer 中。不要将卡分开，因为这会增加 I/O 延迟。如需更多信息，请参阅 IBM Power11 文档。

如果您在 IBM Spyre AI Accelerator 卡时遇到问题，您可以使用 aiu-smi 工具以及您要配置集的工作负载。执行以下步骤：

启动模型。

从第二个终端中，查询模型。例如：

curl http://127.0.0.1:8000/v1/completions -H "Content-Type: application/json" \
    -d '{ "model": "/models/granite-3.3-8b-instruct",
          "prompt": "Write me a long story about surfing dogs in Malibu.",
          "max_tokens": 8128,
          "temperature": 1,
          "n": 10
        }'

$ curl http://127.0.0.1:8000/v1/completions -H "Content-Type: application/json" \
    -d '{ "model": "/models/granite-3.3-8b-instruct",
          "prompt": "Write me a long story about surfing dogs in Malibu.",
          "max_tokens": 8128,
          "temperature": 1,
          "n": 10
        }'

Copy to Clipboard

Toggle word wrap

在第三个终端中，运行 aiu-smi 工具：
```
podman exec -it <CONTAINER_ID> -c aiu-smi
```
```
$ podman exec -it <CONTAINER_ID> -c aiu-smi
```
Copy to Clipboard Toggle word wrap

或者，在正在运行的容器中执行并运行 aiu-smi。例如：

podman exec -it <CONTAINER_ID> bash

$ podman exec -it <CONTAINER_ID> bash

Copy to Clipboard

Toggle word wrap

在容器内运行 aiu-smi 工具：

aiu-smi

[senuser@689230aca2ba ~]$ aiu-smi

Copy to Clipboard

Toggle word wrap

aiu-smi 输出示例

#MetricFiles
0 /tmp/metrics.0181:50:00.0
1 /tmp/metrics.0182:60:00.0
2 /tmp/metrics.0183:70:00.0
3 /tmp/metrics.0184:80:00.0
#ID Date      Time      hostcpu hostmem    pwr  gtemp   busy    rdmem    wrmem    rxpci    txpci   rdrdma   wrrdma   rsvmem
#   YYYYMMDD  HH:MM:SS        %       %      W      C      %     GB/s     GB/s     GB/s     GB/s     GB/s     GB/s       MB
  0 20251103  20:18:36    951.6    11.5   33.8   34.1     96   41.221    5.480    0.967    0.964    0.000    0.000    0.000
  1 20251103  20:18:36    951.6    11.5   30.6   33.0     96   41.201    5.464    0.967    0.964    0.000    0.000    0.000
  2 20251103  20:18:36    951.6    11.5   40.5   34.7     96   41.266    5.473    0.969    0.966    0.000    0.000    0.000
  3 20251103  20:18:36    951.6    11.5   37.3   39.2     96   41.358    5.484    0.971    0.968    0.000    0.000    0.000

#MetricFiles
# 0 /tmp/metrics.0181:50:00.0
# 1 /tmp/metrics.0182:60:00.0
# 2 /tmp/metrics.0183:70:00.0
# 3 /tmp/metrics.0184:80:00.0
#ID Date      Time      hostcpu hostmem    pwr  gtemp   busy    rdmem    wrmem    rxpci    txpci   rdrdma   wrrdma   rsvmem
#   YYYYMMDD  HH:MM:SS        %       %      W      C      %     GB/s     GB/s     GB/s     GB/s     GB/s     GB/s       MB
  0 20251103  20:18:36    951.6    11.5   33.8   34.1     96   41.221    5.480    0.967    0.964    0.000    0.000    0.000
  1 20251103  20:18:36    951.6    11.5   30.6   33.0     96   41.201    5.464    0.967    0.964    0.000    0.000    0.000
  2 20251103  20:18:36    951.6    11.5   40.5   34.7     96   41.266    5.473    0.969    0.966    0.000    0.000    0.000
  3 20251103  20:18:36    951.6    11.5   37.3   39.2     96   41.358    5.484    0.971    0.968    0.000    0.000    0.000

Copy to Clipboard

Toggle word wrap

10.8. IBM Power 问题故障排除

10.8.1. 用于 Power AI acclerator 卡问题的 IBM Spyre
复制链接

10.8.2. IBM Spyre 用于 Power 性能问题
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.8. IBM Power 问题故障排除

10.8.1. 用于 Power AI acclerator 卡问题的 IBM Spyre复制链接链接已复制到粘贴板!

10.8.2. IBM Spyre 用于 Power 性能问题复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.8.1. 用于 Power AI acclerator 卡问题的 IBM Spyre
复制链接

10.8.2. IBM Spyre 用于 Power 性能问题
复制链接