Supported product and hardware configurations


Red Hat AI 3

Supported hardware and software configurations for deploying Red Hat AI software

Red Hat AI Documentation Team

Abstract

Learn about supported hardware and software configurations for Red Hat AI.

Red Hat AI software runs on a variety of supported hardware, software, and delivery platforms in production environments.

Important

Technology Preview and Developer Preview features are provided for early access to potential new features.

Technology Preview or Developer Preview features are not supported or recommended for production workloads.

Supported configurations span multiple AI accelerator types including NVIDIA GPUs, AMD GPUs, Google TPUs, and IBM Spyre accelerators. Red Hat AI Inference Server can be deployed in OpenShift Container Platform clusters, on standalone Red Hat Enterprise Linux (RHEL) hosts with Podman, or integrated with Red Hat OpenShift AI for managed AI/ML workflows.

Chapter 2. Product and version compatibility

The following table lists the supported product versions for Red Hat AI Inference Server, Red Hat Enterprise Linux AI, and Red Hat OpenShift AI.

Expand
Table 2.1. AI Inference Server product and version compatibility
Product versionvLLM core versionLLM Compressor version

3.4.0-ea.1

v0.14.1

v0.9.0.2

3.3

v0.13.0

v0.9.0.1

3.2.5

v0.11.2

v0.8.1

3.2.4

v0.11.0

v0.8.1

3.2.3

v0.11.0

v0.8.1

3.2.2

v0.10.1.1

v0.7.1

3.2.1

v0.10.0

Not included in this release

3.2.0

v0.9.2

Not included in this release

Expand
Table 2.2. Red Hat OpenShift AI product and version compatibility
Product versionvLLM core versionLLM Compressor version

3.3

v0.13.0

v0.9.0.1

3.2

v0.11.2

v0.8.1

3.0

v0.11.0

v0.8.1

Expand
Table 2.3. Red Hat Enterprise Linux AI product and version compatibility
Product versionvLLM core versionLLM Compressor version

3.3

v0.13.0

v0.9.0.1

3.2

v0.11.2

v0.8.1

3.0

v0.11.0

v0.8.1

The following tables list the supported AI data center grade accelerators for Red Hat AI Inference Server 3.4.

Important

Red Hat AI Inference Server supports data center grade AI accelerators only.

Expand
Table 3.1. Supported NVIDIA AI accelerators for registry.redhat.io/rhaii-early-access/vllm-cuda-rhel9:3.4.0-ea.1
vLLM releaseAI acceleratorsRequirementsvLLM architecture supportLLM Compressor support

vLLM v0.14.1

NVIDIA data center GPUs:

  • Turing: T4
  • Ampere: A2, A10, A16, A30, A40, A100
  • Ada Lovelace: L4, L20, L40, L40S
  • Hopper: H100, H200, H20, GH200
  • Blackwell: GB200, GB300, B200, B300, RTX PRO 6000 Blackwell Server Edition, RTX PRO 4500 Blackwell Server Edition
  • x86
  • AArch64

Supported, now packaged separately in the model-opt-cuda-rhel9 container image.

Important

Red Hat AI Inference Server 3.4.0-ea.1 is built with CUDA 13.0. The container images are backward compatible with CUDA 12.9 drivers.

If your host driver version is older than the CUDA toolkit version shipped in the AI Inference Server container, you can use NVIDIA Forward Compatibility to avoid driver upgrades.

Note

NVIDIA T4 and A100 accelerators do not support FP8 (W8A8) quantization.

Expand
Table 3.2. Supported AMD AI accelerators for registry.redhat.io/rhaii-early-access/vllm-rocm-rhel9:3.4.0-ea.1
vLLM releaseAI acceleratorsRequirementsvLLM architecture supportLLM Compressor support

vLLM v0.14.1

  • AMD Instinct MI210
  • AMD Instinct MI300X
  • AMD Instinct MI325X

x86

Not supported

Note

AMD GPUs support FP8 (W8A8) and GGUF quantization schemes only.

Expand
Table 3.3. Supported Google TPU AI accelerators for registry.redhat.io/rhaii-early-access/vllm-tpu-rhel9:3.4.0-ea.1 (Technology Preview)
vLLM releaseAI acceleratorsRequirementsvLLM architecture supportLLM Compressor support

vLLM v0.14.1

Google v4, v5e, v5p, v6e (Trillium)

x86 Technology Preview

Not supported

Expand
Table 3.4. Supported IBM Spyre AI accelerators on registry.redhat.io/rhaii-early-access/vllm‑spyre-rhel9:3.4.0-ea.1
vLLM releaseAI acceleratorsRequirementsvLLM architecture supportLLM Compressor support

vLLM v0.14.1

IBM Spyre for Power (ppc64le)

IBM Power (ppc64le)

Not supported

vLLM v0.14.1

IBM Spyre for Z (s390x)

IBM Z (s390x)

Not supported

vLLM v0.14.1

IBM AIU (x86)

x86 Technology Preview

Not supported

Important

IBM AIU support for x86 is available as a Technology Preview feature only. IBM AIU for x86 is not a Generally Available (GA) feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Expand
Table 3.5. Supported AWS Neuron AI accelerators for registry.redhat.io/rhaii-early-access/vllm-neuron-rhel9:3.4.0-ea.1
vLLM releaseAI acceleratorsRequirementsvLLM architecture supportLLM Compressor support

vLLM v0.14.1

AWS Inferentia2 (Inf2), AWS Trainium (Trn1, Trn1n, Trn2)

x86 Dev Preview

Not supported

Important

AWS Trainium and Inferentia support is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Different AI accelerator architectures support different types of model quantization, depending on the compute capabilities of the hardware. The following tables list the AI accelerators that support INT8, INT4, FP8, and NVFP4 quantization formats.

  • INT8 (W8A8) quantization reduces model weights and activations to 8-bit integers, providing significant memory savings while maintaining acceptable accuracy for many use cases.
  • INT8 (W4A8) quantization reduces model weights to 4-bit integers, keeping activations at 8-bit precision. INT8 (W4A8) improves memory efficiency compared to W8A8 while preserving higher activation fidelity for inference.
  • INT4 (W4A16) quantization reduces model weights to 4-bit integers while maintaining 16-bit activations, enabling larger models to fit in GPU memory with minimal accuracy loss.
  • FP8 (W8A8) quantization uses 8-bit floating point representation for weights and activations, offering a balance between memory efficiency and numerical precision for training and inference workloads.
  • NVFP4 quantization uses NVIDIA’s 4-bit floating point format with two-level scaling (FP8 fine-grained scales and FP32 tensor-level scale), providing maximum memory efficiency for inference on NVIDIA Blackwell hardware.
Expand
Table 4.1. Supported NVIDIA AI accelerators for INT8 (W8A8) quantization
ArchitectureSupported AI acceleratorsMinimum compute capability

Turing

Tesla T4

7.5

Ampere

A10, A30, A40, A100

8.0

Ada Lovelace

L4, L40, L40S

8.9

Hopper

H100, H200, GH200

9.0

Note

NVIDIA Blackwell architecture (B200, B300, GB200, GB300) does not support INT8 quantization in vLLM due to kernel limitations. Use FP8 or NVFP4 quantization instead.

Expand
Table 4.2. Supported AMD AI accelerators for INT8 (W8A8) quantization
ArchitectureSupported AI accelerators

CDNA 2

MI210

CDNA 3

MI300X, MI325X

Expand
Table 4.3. Supported NVIDIA AI accelerators for INT4 (W4A16) quantization
ArchitectureSupported AI acceleratorsMinimum compute capability

Ampere

A10, A30, A40, A100

8.0

Ada Lovelace

L4, L40, L40S

8.9

Hopper

H100, H200, GH200

9.0

Blackwell

B200, B300, GB200, GB300

10.0

Note

NVIDIA Turing architecture (Tesla T4) does not have optimized vLLM kernel support for INT4 quantization. Use Ampere or newer architectures for INT4 inference.

Expand
Table 4.4. Supported AMD AI accelerators for INT4 (W4A16) quantization
ArchitectureSupported AI accelerators

CDNA 3

MI300X, MI325X

Note

AMD CDNA 2 architecture (MI210) does not have optimized vLLM kernel support for INT4 quantization.

Expand
Table 4.5. Supported NVIDIA AI accelerators for FP8 (W8A8) quantization
ArchitectureSupported AI acceleratorsMinimum compute capability

Ada Lovelace

L4, L40, L40S

8.9

Hopper

H100, H200, GH200

9.0

Blackwell

B200, B300, GB200, GB300

10.0

Note

NVIDIA Turing architecture (Tesla T4) and Ampere architecture (A10, A30, A40, A100) AI accelerators do not support FP8 W8A8 quantization due to hardware limitations. However, FP8 weight-only (W8A16) quantization is available on these architectures by using Marlin kernels.

Expand
Table 4.6. Supported AMD AI accelerators for FP8 (W8A8) quantization
ArchitectureSupported AI accelerators

CDNA 3

MI300X, MI325X

Note

AMD CDNA 2 architecture AI accelerators (MI210) do not support FP8 quantization due to hardware limitations.

Expand
Table 4.7. Supported NVIDIA AI accelerators for NVFP4 quantization
ArchitectureSupported AI acceleratorsMinimum compute capability

Blackwell

B200, B300, GB200, GB300

10.0

Note

NVFP4 quantization is only available on NVIDIA Blackwell architecture AI accelerators. AMD AI accelerators do not support NVFP4 quantization.

Chapter 5. Supported AI accelerators for RHEL AI

The following AI accelerators are supported for inference serving with Red Hat AI Inference Server on RHEL AI.

Important

Bare metal deployments of RHEL AI are supported for all NVIDIA CUDA and AMD ROCm AI accelerators listed in Supported AI accelerators for Red Hat AI Inference Server.

Actual requirements vary based on the specific models you deploy, quantization methods, context lengths, and concurrent request loads. Aggregate GPU memory refers to the total GPU memory available across all GPUs in the system that can be used for tensor parallelism or pipeline parallelism.

For more information about inference serving on bare metal or Cloud platforms, see Red Hat Enterprise Linux AI.

Important

The recommended minimum additional disk storage for all platforms is 1 TB.

Expand
Table 5.1. Supported AI accelerators for Amazon Web Services (AWS) deployments
NVIDIA AI acceleratorAggregate GPU memoryAWS instance family

GB200

384 GB

P6e series

B200

192 GB

P6 series

RTX PRO 6000 Blackwell Server Edition

96 GB

G7e series

H100

80 GB

P5 series

L40S

48 GB

G6e series

A100

40 GB

P4d series

L4

24 GB

G6 series

Expand
Table 5.2. Supported AI accelerators for IBM Cloud deployments
NVIDIA AI acceleratorAggregate GPU memoryIBM Cloud instance family

H200

141 GB

gx3 series

H100

80 GB

gx3 series

A100

80 GB

gx3 series

L40S

48 GB

gx3 series

L4

24 GB

gx3 series

Expand
Table 5.3. Supported AI accelerators for Microsoft Azure deployments
AI acceleratorAggregate GPU memoryAzure instance family

NVIDIA GB200

384 GB

ND series

AMD Instinct MI300X

192 GB

ND series

NVIDIA H100

80 GB

ND series

NVIDIA A100

80 GB

ND series

AMD Instinct MI210

64 GB

ND series

Expand
Table 5.4. Supported AI accelerators for Google Cloud deployments
NVIDIA AI acceleratorAggregate GPU memoryGoogle Cloud instance family

GB200

384 GB

A4X series

B200

192 GB

A4 series

4xL4

96 GB

G2 series

H100

80 GB

A3 series

A100

40 GB

A2 series

You must install the AI accelerator Operator that is relevant to the AI accelerator that you want to use with Red Hat OpenShift AI.

OpenShift AI provides Operators that support integration with AI accelerators. OpenShift AI also provides images that include libraries that work with NVIDIA, AMD, and Intel Gaudi data center grade AI accelerators.

Chapter 7. Supported deployment environments

The following deployment environments for Red Hat AI Inference Server are supported.

Note

Red Hat AI Inference Server is available only as a container image. The host operating system and kernel must support the required accelerator drivers. For more information, see Supported AI accelerators.

Expand
Table 7.1. Red Hat AI Inference Server supported deployment environments
EnvironmentSupported versionsDeployment notes

OpenShift Container Platform (self‑managed)

4.14+

Deploy on bare‑metal hosts or virtual machines.

Red Hat OpenShift Service on AWS (ROSA)

4.14+

Requires a ROSA cluster with STS and GPU‑enabled P5 or G5 node types. See Prepare your environment for more information.

Red Hat Enterprise Linux AI

3.0+

Deploy on bare‑metal hosts or virtual machines.

Red Hat Enterprise Linux (RHEL)

9.2+

Deploy on bare‑metal hosts or virtual machines.

Linux (not RHEL)

-

Supported under third‑party policy deployed on bare‑metal hosts or virtual machines. OpenShift Container Platform Operators are not required.

Kubernetes (not OpenShift Container Platform)

-

Supported under third‑party policy deployed on bare‑metal hosts or virtual machines.

Important
  • Single-host deployments for IBM Spyre AI accelerators on IBM Z and IBM Power are supported for RHEL AI 9.6+.
  • Cluster deployments for IBM Spyre AI accelerators on IBM Z are supported as part of Red Hat OpenShift AI version 3.0+ only.

The following table lists the minimum OpenShift Container Platform software prerequisites for AI accelerator workloads.

Expand
Table 8.1. Minimum software prerequisites for AI accelerator workloads
ComponentMinimum versionOperator

NVIDIA GPU Operator

24.3

NVIDIA GPU Operator OLM Operator

AMD GPU Operator

6.2

AMD GPU Operator OLM Operator

IBM Spyre Operator

1.0

IBM Spyre Operator

Node Feature Discovery [1]

4.14

Node Feature Discovery Operator

[1] Included by default with OpenShift Container Platform. Node Feature Discovery is required for scheduling NUMA-aware workloads.

Chapter 9. Lifecycle and update policy

Security and critical bug fixes are delivered as container images available from the registry.access.redhat.com/rhaiis container registry and are announced through RHSA advisories. See RHAII container images on catalog.redhat.com for more details.

Legal Notice

Copyright © Red Hat.
Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.
The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.
All other trademarks are the property of their respective owners.
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top