Supported product and hardware configurations

Red Hat AI 3

Supported hardware and software configurations for deploying Red Hat AI software

Red Hat AI Documentation Team

Abstract

Learn about supported hardware and software configurations for Red Hat AI.

Chapter 1. About Red Hat AI Inference Server supported hardware and software
Copy link

Red Hat AI software runs on a variety of supported hardware, software, and delivery platforms in production environments.

Important

Technology Preview and Developer Preview features are provided for early access to potential new features.

Technology Preview or Developer Preview features are not supported or recommended for production workloads.

Supported configurations span multiple AI accelerator types including NVIDIA GPUs, AMD GPUs, Google TPUs, and IBM Spyre accelerators. Red Hat AI Inference Server can be deployed in OpenShift Container Platform clusters, on standalone Red Hat Enterprise Linux (RHEL) hosts with Podman, or integrated with Red Hat OpenShift AI for managed AI/ML workflows.

Chapter 2. Product and version compatibility
Copy link

The following table lists the supported product versions for Red Hat AI Inference Server, Red Hat Enterprise Linux AI, and Red Hat OpenShift AI.

Expand

Table 2.1. AI Inference Server product and version compatibility
Product version	vLLM core version	LLM Compressor version
3.4.0-ea.1	v0.14.1	v0.9.0.2
3.3	v0.13.0	v0.9.0.1
3.2.5	v0.11.2	v0.8.1
3.2.4	v0.11.0	v0.8.1
3.2.3	v0.11.0	v0.8.1
3.2.2	v0.10.1.1	v0.7.1
3.2.1	v0.10.0	Not included in this release
3.2.0	v0.9.2	Not included in this release

Expand

Table 2.2. Red Hat OpenShift AI product and version compatibility
Product version	vLLM core version	LLM Compressor version
3.3	v0.13.0	v0.9.0.1
3.2	v0.11.2	v0.8.1
3.0	v0.11.0	v0.8.1

Expand

Table 2.3. Red Hat Enterprise Linux AI product and version compatibility
Product version	vLLM core version	LLM Compressor version
3.3	v0.13.0	v0.9.0.1
3.2	v0.11.2	v0.8.1
3.0	v0.11.0	v0.8.1

Chapter 3. Supported AI accelerators for Red Hat AI Inference Server
Copy link

The following tables list the supported AI data center grade accelerators for Red Hat AI Inference Server 3.4.

Important

Red Hat AI Inference Server supports data center grade AI accelerators only.

Expand

Table 3.1. Supported NVIDIA AI accelerators for registry.redhat.io/rhaii-early-access/vllm-cuda-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	NVIDIA data center GPUs: Turing: T4 Ampere: A2, A10, A16, A30, A40, A100 Ada Lovelace: L4, L20, L40, L40S Hopper: H100, H200, H20, GH200 Blackwell: GB200, GB300, B200, B300, RTX PRO 6000 Blackwell Server Edition, RTX PRO 4500 Blackwell Server Edition	CUDA Toolkit 13.0 NVIDIA Container Toolkit 1.14 NVIDIA GPU Operator 24.3 Python 3.12 PyTorch 2.9.1	x86 AArch64	Supported, now packaged separately in the `model-opt-cuda-rhel9` container image.

Important

Red Hat AI Inference Server 3.4.0-ea.1 is built with CUDA 13.0. The container images are backward compatible with CUDA 12.9 drivers.

If your host driver version is older than the CUDA toolkit version shipped in the AI Inference Server container, you can use NVIDIA Forward Compatibility to avoid driver upgrades.

Note

NVIDIA T4 and A100 accelerators do not support FP8 (W8A8) quantization.

Expand

Table 3.2. Supported AMD AI accelerators for registry.redhat.io/rhaii-early-access/vllm-rocm-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	AMD Instinct MI210 AMD Instinct MI300X AMD Instinct MI325X	ROCm 6.3.4 AMD GPU Operator 6.2 Python 3.12 PyTorch 2.9.1	x86	Not supported

Note

AMD GPUs support FP8 (W8A8) and GGUF quantization schemes only.

Expand

Table 3.3. Supported Google TPU AI accelerators for registry.redhat.io/rhaii-early-access/vllm-tpu-rhel9:3.4.0-ea.1 (Technology Preview)
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	Google v4, v5e, v5p, v6e (Trillium)	Python 3.12	x86 Technology Preview	Not supported

Expand

Table 3.4. Supported IBM Spyre AI accelerators on registry.redhat.io/rhaii-early-access/vllm‑spyre-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	IBM Spyre for Power (ppc64le)	Python 3.12.9 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.6.1 IBM Spyre Enablement Stack 1.1.1	IBM Power (ppc64le)	Not supported
vLLM v0.14.1	IBM Spyre for Z (s390x)	Python 3.12 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.2.0 IBM Spyre Enablement Stack 1.1.1	IBM Z (s390x)	Not supported
vLLM v0.14.1	IBM AIU (x86)	Python 3.12 PyTorch 2.7.1 vllm-tgis-adapter 0.9.2 vllm-spyre 1.0.2 IBM Spyre Enablement Stack 1.1.1	x86 Technology Preview	Not supported

Important

IBM AIU support for x86 is available as a Technology Preview feature only. IBM AIU for x86 is not a Generally Available (GA) feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Expand

Table 3.5. Supported AWS Neuron AI accelerators for registry.redhat.io/rhaii-early-access/vllm-neuron-rhel9:3.4.0-ea.1
vLLM release	AI accelerators	Requirements	vLLM architecture support	LLM Compressor support
vLLM v0.14.1	AWS Inferentia2 (Inf2), AWS Trainium (Trn1, Trn1n, Trn2)	AWS Neuron SDK 2.x Python 3.12 vllm-neuron plugin	x86 Dev Preview	Not supported

Important

AWS Trainium and Inferentia support is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Chapter 4. Supported AI accelerator model quantization formats
Copy link

Different AI accelerator architectures support different types of model quantization, depending on the compute capabilities of the hardware. The following tables list the AI accelerators that support INT8, INT4, FP8, and NVFP4 quantization formats.

INT8 (W8A8) quantization reduces model weights and activations to 8-bit integers, providing significant memory savings while maintaining acceptable accuracy for many use cases.
INT8 (W4A8) quantization reduces model weights to 4-bit integers, keeping activations at 8-bit precision. INT8 (W4A8) improves memory efficiency compared to W8A8 while preserving higher activation fidelity for inference.
INT4 (W4A16) quantization reduces model weights to 4-bit integers while maintaining 16-bit activations, enabling larger models to fit in GPU memory with minimal accuracy loss.
FP8 (W8A8) quantization uses 8-bit floating point representation for weights and activations, offering a balance between memory efficiency and numerical precision for training and inference workloads.
NVFP4 quantization uses NVIDIA’s 4-bit floating point format with two-level scaling (FP8 fine-grained scales and FP32 tensor-level scale), providing maximum memory efficiency for inference on NVIDIA Blackwell hardware.

Expand

Table 4.1. Supported NVIDIA AI accelerators for INT8 (W8A8) quantization
Architecture	Supported AI accelerators	Minimum compute capability
Turing	Tesla T4	7.5
Ampere	A10, A30, A40, A100	8.0
Ada Lovelace	L4, L40, L40S	8.9
Hopper	H100, H200, GH200	9.0

Note

NVIDIA Blackwell architecture (B200, B300, GB200, GB300) does not support INT8 quantization in vLLM due to kernel limitations. Use FP8 or NVFP4 quantization instead.

Expand

Table 4.2. Supported AMD AI accelerators for INT8 (W8A8) quantization
Architecture	Supported AI accelerators
CDNA 2	MI210
CDNA 3	MI300X, MI325X

Expand

Table 4.3. Supported NVIDIA AI accelerators for INT4 (W4A16) quantization
Architecture	Supported AI accelerators	Minimum compute capability
Ampere	A10, A30, A40, A100	8.0
Ada Lovelace	L4, L40, L40S	8.9
Hopper	H100, H200, GH200	9.0
Blackwell	B200, B300, GB200, GB300	10.0

Note

NVIDIA Turing architecture (Tesla T4) does not have optimized vLLM kernel support for INT4 quantization. Use Ampere or newer architectures for INT4 inference.

Expand

Table 4.4. Supported AMD AI accelerators for INT4 (W4A16) quantization
Architecture	Supported AI accelerators
CDNA 3	MI300X, MI325X

Note

AMD CDNA 2 architecture (MI210) does not have optimized vLLM kernel support for INT4 quantization.

Expand

Table 4.5. Supported NVIDIA AI accelerators for FP8 (W8A8) quantization
Architecture	Supported AI accelerators	Minimum compute capability
Ada Lovelace	L4, L40, L40S	8.9
Hopper	H100, H200, GH200	9.0
Blackwell	B200, B300, GB200, GB300	10.0

Note

NVIDIA Turing architecture (Tesla T4) and Ampere architecture (A10, A30, A40, A100) AI accelerators do not support FP8 W8A8 quantization due to hardware limitations. However, FP8 weight-only (W8A16) quantization is available on these architectures by using Marlin kernels.

Expand

Table 4.6. Supported AMD AI accelerators for FP8 (W8A8) quantization
Architecture	Supported AI accelerators
CDNA 3	MI300X, MI325X

Note

AMD CDNA 2 architecture AI accelerators (MI210) do not support FP8 quantization due to hardware limitations.

Expand

Table 4.7. Supported NVIDIA AI accelerators for NVFP4 quantization
Architecture	Supported AI accelerators	Minimum compute capability
Blackwell	B200, B300, GB200, GB300	10.0

Note

NVFP4 quantization is only available on NVIDIA Blackwell architecture AI accelerators. AMD AI accelerators do not support NVFP4 quantization.

Chapter 5. Supported AI accelerators for RHEL AI
Copy link

The following AI accelerators are supported for inference serving with Red Hat AI Inference Server on RHEL AI.

Important

Bare metal deployments of RHEL AI are supported for all NVIDIA CUDA and AMD ROCm AI accelerators listed in Supported AI accelerators for Red Hat AI Inference Server.

Actual requirements vary based on the specific models you deploy, quantization methods, context lengths, and concurrent request loads. Aggregate GPU memory refers to the total GPU memory available across all GPUs in the system that can be used for tensor parallelism or pipeline parallelism.

For more information about inference serving on bare metal or Cloud platforms, see Red Hat Enterprise Linux AI.

Important

The recommended minimum additional disk storage for all platforms is 1 TB.

Expand

Table 5.1. Supported AI accelerators for Amazon Web Services (AWS) deployments
NVIDIA AI accelerator	Aggregate GPU memory	AWS instance family
GB200	384 GB	P6e series
B200	192 GB	P6 series
RTX PRO 6000 Blackwell Server Edition	96 GB	G7e series
H100	80 GB	P5 series
L40S	48 GB	G6e series
A100	40 GB	P4d series
L4	24 GB	G6 series

Expand

Table 5.2. Supported AI accelerators for IBM Cloud deployments
NVIDIA AI accelerator	Aggregate GPU memory	IBM Cloud instance family
H200	141 GB	gx3 series
H100	80 GB	gx3 series
A100	80 GB	gx3 series
L40S	48 GB	gx3 series
L4	24 GB	gx3 series

Expand

Table 5.3. Supported AI accelerators for Microsoft Azure deployments
AI accelerator	Aggregate GPU memory	Azure instance family
NVIDIA GB200	384 GB	ND series
AMD Instinct MI300X	192 GB	ND series
NVIDIA H100	80 GB	ND series
NVIDIA A100	80 GB	ND series
AMD Instinct MI210	64 GB	ND series

Expand

Table 5.4. Supported AI accelerators for Google Cloud deployments
NVIDIA AI accelerator	Aggregate GPU memory	Google Cloud instance family
GB200	384 GB	A4X series
B200	192 GB	A4 series
4xL4	96 GB	G2 series
H100	80 GB	A3 series
A100	40 GB	A2 series

Chapter 6. Supported AI accelerators for Red Hat OpenShift AI
Copy link

You must install the AI accelerator Operator that is relevant to the AI accelerator that you want to use with Red Hat OpenShift AI.

OpenShift AI provides Operators that support integration with AI accelerators. OpenShift AI also provides images that include libraries that work with NVIDIA, AMD, and Intel Gaudi data center grade AI accelerators.

Chapter 7. Supported deployment environments
Copy link

The following deployment environments for Red Hat AI Inference Server are supported.

Note

Red Hat AI Inference Server is available only as a container image. The host operating system and kernel must support the required accelerator drivers. For more information, see Supported AI accelerators.

Expand

Table 7.1. Red Hat AI Inference Server supported deployment environments
Environment	Supported versions	Deployment notes
OpenShift Container Platform (self‑managed)	4.14+	Deploy on bare‑metal hosts or virtual machines.
Red Hat OpenShift Service on AWS (ROSA)	4.14+	Requires a ROSA cluster with STS and GPU‑enabled P5 or G5 node types. See Prepare your environment for more information.
Red Hat Enterprise Linux AI	3.0+	Deploy on bare‑metal hosts or virtual machines.
Red Hat Enterprise Linux (RHEL)	9.2+	Deploy on bare‑metal hosts or virtual machines.
Linux (not RHEL)	-	Supported under third‑party policy deployed on bare‑metal hosts or virtual machines. OpenShift Container Platform Operators are not required.
Kubernetes (not OpenShift Container Platform)	-	Supported under third‑party policy deployed on bare‑metal hosts or virtual machines.

Important

Single-host deployments for IBM Spyre AI accelerators on IBM Z and IBM Power are supported for RHEL AI 9.6+.
Cluster deployments for IBM Spyre AI accelerators on IBM Z are supported as part of Red Hat OpenShift AI version 3.0+ only.

Chapter 8. OpenShift Container Platform software prerequisites for GPU deployments
Copy link

The following table lists the minimum OpenShift Container Platform software prerequisites for AI accelerator workloads.

Expand

Table 8.1. Minimum software prerequisites for AI accelerator workloads
Component	Minimum version	Operator
NVIDIA GPU Operator	24.3	NVIDIA GPU Operator OLM Operator
AMD GPU Operator	6.2	AMD GPU Operator OLM Operator
IBM Spyre Operator	1.0	IBM Spyre Operator
Node Feature Discovery ^[1]	4.14	Node Feature Discovery Operator

[1] Included by default with OpenShift Container Platform. Node Feature Discovery is required for scheduling NUMA-aware workloads.

Chapter 9. Lifecycle and update policy
Copy link

Security and critical bug fixes are delivered as container images available from the registry.access.redhat.com/rhaiis container registry and are announced through RHSA advisories. See RHAII container images on catalog.redhat.com for more details.

Legal Notice
Copy link

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.

Supported product and hardware configurations

Supported hardware and software configurations for deploying Red Hat AI software

Chapter 1. About Red Hat AI Inference Server supported hardware and software
Copy link

Chapter 2. Product and version compatibility
Copy link

Chapter 3. Supported AI accelerators for Red Hat AI Inference Server
Copy link

Chapter 4. Supported AI accelerator model quantization formats
Copy link

Chapter 5. Supported AI accelerators for RHEL AI
Copy link

Chapter 6. Supported AI accelerators for Red Hat OpenShift AI
Copy link

Chapter 7. Supported deployment environments
Copy link

Chapter 8. OpenShift Container Platform software prerequisites for GPU deployments
Copy link

Chapter 9. Lifecycle and update policy
Copy link

Legal Notice
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Supported product and hardware configurations

Supported hardware and software configurations for deploying Red Hat AI software

Chapter 1. About Red Hat AI Inference Server supported hardware and softwareCopy linkLink copied to clipboard!

Chapter 2. Product and version compatibilityCopy linkLink copied to clipboard!

Chapter 3. Supported AI accelerators for Red Hat AI Inference ServerCopy linkLink copied to clipboard!

Chapter 4. Supported AI accelerator model quantization formatsCopy linkLink copied to clipboard!

Chapter 5. Supported AI accelerators for RHEL AICopy linkLink copied to clipboard!

Chapter 6. Supported AI accelerators for Red Hat OpenShift AICopy linkLink copied to clipboard!

Chapter 7. Supported deployment environmentsCopy linkLink copied to clipboard!

Chapter 8. OpenShift Container Platform software prerequisites for GPU deploymentsCopy linkLink copied to clipboard!

Chapter 9. Lifecycle and update policyCopy linkLink copied to clipboard!

Legal NoticeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. About Red Hat AI Inference Server supported hardware and software
Copy link

Chapter 2. Product and version compatibility
Copy link

Chapter 3. Supported AI accelerators for Red Hat AI Inference Server
Copy link

Chapter 4. Supported AI accelerator model quantization formats
Copy link

Chapter 5. Supported AI accelerators for RHEL AI
Copy link

Chapter 6. Supported AI accelerators for Red Hat OpenShift AI
Copy link

Chapter 7. Supported deployment environments
Copy link

Chapter 8. OpenShift Container Platform software prerequisites for GPU deployments
Copy link

Chapter 9. Lifecycle and update policy
Copy link

Legal Notice
Copy link