Chapter 5. Supported AI accelerators for RHEL AI
The following AI accelerators are supported for inference serving with Red Hat AI Inference Server on RHEL AI.
Bare metal deployments of RHEL AI are supported for all NVIDIA CUDA and AMD ROCm AI accelerators listed in Supported AI accelerators for Red Hat AI Inference Server.
Actual requirements vary based on the specific models you deploy, quantization methods, context lengths, and concurrent request loads. Aggregate GPU memory refers to the total GPU memory available across all GPUs in the system that can be used for tensor parallelism or pipeline parallelism.
For more information about inference serving on bare metal or Cloud platforms, see Red Hat Enterprise Linux AI.
The recommended minimum additional disk storage for all platforms is 1 TB.
| NVIDIA AI accelerator | Aggregate GPU memory | AWS instance family |
|---|---|---|
| GB200 | 384 GB | P6e series |
| B200 | 192 GB | P6 series |
| RTX PRO 6000 Blackwell Server Edition | 96 GB | G7e series |
| H100 | 80 GB | P5 series |
| L40S | 48 GB | G6e series |
| A100 | 40 GB | P4d series |
| L4 | 24 GB | G6 series |
| NVIDIA AI accelerator | Aggregate GPU memory | IBM Cloud instance family |
|---|---|---|
| H200 | 141 GB | gx3 series |
| H100 | 80 GB | gx3 series |
| A100 | 80 GB | gx3 series |
| L40S | 48 GB | gx3 series |
| L4 | 24 GB | gx3 series |
| AI accelerator | Aggregate GPU memory | Azure instance family |
|---|---|---|
| NVIDIA GB200 | 384 GB | ND series |
| AMD Instinct MI300X | 192 GB | ND series |
| NVIDIA H100 | 80 GB | ND series |
| NVIDIA A100 | 80 GB | ND series |
| AMD Instinct MI210 | 64 GB | ND series |
| NVIDIA AI accelerator | Aggregate GPU memory | Google Cloud instance family |
|---|---|---|
| GB200 | 384 GB | A4X series |
| B200 | 192 GB | A4 series |
| 4xL4 | 96 GB | G2 series |
| H100 | 80 GB | A3 series |
| A100 | 40 GB | A2 series |