Chapter 3. Supported AI accelerators for Red Hat AI Inference Server
The following tables list the supported AI data center grade accelerators for Red Hat AI Inference Server 3.4.
Red Hat AI Inference Server supports data center grade AI accelerators only.
| vLLM release | AI accelerators | Requirements | vLLM architecture support | LLM Compressor support |
|---|---|---|---|---|
| vLLM v0.14.1 | NVIDIA data center GPUs:
|
|
Supported, now packaged separately in the |
Red Hat AI Inference Server 3.4.0-ea.1 is built with CUDA 13.0. The container images are backward compatible with CUDA 12.9 drivers.
If your host driver version is older than the CUDA toolkit version shipped in the AI Inference Server container, you can use NVIDIA Forward Compatibility to avoid driver upgrades.
NVIDIA T4 and A100 accelerators do not support FP8 (W8A8) quantization.
| vLLM release | AI accelerators | Requirements | vLLM architecture support | LLM Compressor support |
|---|---|---|---|---|
| vLLM v0.14.1 |
| x86 | Not supported |
AMD GPUs support FP8 (W8A8) and GGUF quantization schemes only.
| vLLM release | AI accelerators | Requirements | vLLM architecture support | LLM Compressor support |
|---|---|---|---|---|
| vLLM v0.14.1 | Google v4, v5e, v5p, v6e (Trillium) | x86 Technology Preview | Not supported |
| vLLM release | AI accelerators | Requirements | vLLM architecture support | LLM Compressor support |
|---|---|---|---|---|
| vLLM v0.14.1 | IBM Spyre for Power (ppc64le) |
| IBM Power (ppc64le) | Not supported |
| vLLM v0.14.1 | IBM Spyre for Z (s390x) |
| IBM Z (s390x) | Not supported |
| vLLM v0.14.1 | IBM AIU (x86) |
| x86 Technology Preview | Not supported |
IBM AIU support for x86 is available as a Technology Preview feature only. IBM AIU for x86 is not a Generally Available (GA) feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
| vLLM release | AI accelerators | Requirements | vLLM architecture support | LLM Compressor support |
|---|---|---|---|---|
| vLLM v0.14.1 | AWS Inferentia2 (Inf2), AWS Trainium (Trn1, Trn1n, Trn2) | x86 Dev Preview | Not supported |
AWS Trainium and Inferentia support is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.