Chapter 3. Supported AI accelerators for Red Hat AI Inference Server

The following tables list the supported AI data center grade accelerators for Red Hat AI Inference Server 3.4.

Important

Red Hat AI Inference Server supports data center grade AI accelerators only.

Expand

Table 3.1. Supported NVIDIA AI accelerators for registry.redhat.io/rhaii-early-access/vllm-cuda-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	NVIDIA data center GPUs: Turing: T4 Ampere: A2, A10, A16, A30, A40, A100 Ada Lovelace: L4, L20, L40, L40S Hopper: H100, H200, H20, GH200 Blackwell: GB200, GB300, B200, B300, RTX PRO 6000 Blackwell Server Edition, RTX PRO 4500 Blackwell Server Edition	CUDA Toolkit 13.0 NVIDIA Container Toolkit 1.14 NVIDIA GPU Operator 24.3 Python 3.12 PyTorch 2.9.1	x86 AArch64	Supported, now packaged separately in the `model-opt-cuda-rhel9` container image.

Important

Red Hat AI Inference Server 3.4.0-ea.1 is built with CUDA 13.0. The container images are backward compatible with CUDA 12.9 drivers.

If your host driver version is older than the CUDA toolkit version shipped in the AI Inference Server container, you can use NVIDIA Forward Compatibility to avoid driver upgrades.

Note

NVIDIA T4 and A100 accelerators do not support FP8 (W8A8) quantization.

Expand

Table 3.2. Supported AMD AI accelerators for registry.redhat.io/rhaii-early-access/vllm-rocm-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	AMD Instinct MI210 AMD Instinct MI300X AMD Instinct MI325X	ROCm 6.3.4 AMD GPU Operator 6.2 Python 3.12 PyTorch 2.9.1	x86	Not supported

Note

AMD GPUs support FP8 (W8A8) and GGUF quantization schemes only.

Expand

Table 3.3. Supported Google TPU AI accelerators for registry.redhat.io/rhaii-early-access/vllm-tpu-rhel9:3.4.0-ea.1 (Technology Preview)
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	Google v4, v5e, v5p, v6e (Trillium)	Python 3.12	x86 Technology Preview	Not supported

Expand

Table 3.4. Supported IBM Spyre AI accelerators on registry.redhat.io/rhaii-early-access/vllm‑spyre-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	IBM Spyre for Power (ppc64le)	Python 3.12.9 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.6.1 IBM Spyre Enablement Stack 1.1.1	IBM Power (ppc64le)	Not supported
vLLM v0.14.1	IBM Spyre for Z (s390x)	Python 3.12 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.2.0 IBM Spyre Enablement Stack 1.1.1	IBM Z (s390x)	Not supported
vLLM v0.14.1	IBM AIU (x86)	Python 3.12 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.0.2 IBM Spyre Enablement Stack 1.1.1	x86 Technology Preview	Not supported

Important

IBM AIU support for x86 is available as a Technology Preview feature only. IBM AIU for x86 is not a Generally Available (GA) feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Expand

Table 3.5. Supported AWS Neuron AI accelerators for registry.redhat.io/rhaii-early-access/vllm-neuron-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	AWS Inferentia2 (Inf2), AWS Trainium (Trn1, Trn1n, Trn2)	AWS Neuron SDK 2.x Python 3.12 vllm-neuron plugin	x86 Dev Preview	Not supported

Important

AWS Trainium and Inferentia support is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Chapter 3. Supported AI accelerators for Red Hat AI Inference Server

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links