Validated models
Red Hat AI validated models
Abstract
Chapter 1. About Red Hat AI validated models Copy linkLink copied to clipboard!
Red Hat AI validated models have been tested and verified to work correctly across supported hardware and product configurations. These models are available as Hugging Face downloads, as OCI artifact images, and as modelcar container images. Platform-specific validated models are also available for IBM Spyre on IBM Power and IBM Z systems.
If you are using AI Inference Server as part of a RHEL AI deployment, use OCI artifact images.
If you are using AI Inference Server as part of a OpenShift AI deployment, use ModelCar images.
Red Hat uses GuideLLM for performance benchmarking and Language Model Evaluation Harness for accuracy evaluations.
Explore the Red Hat AI validated models collections on Hugging Face.
AMD GPUs support FP8 (W8A8) and GGUF quantization variant models only. For more information, see Supported hardware.
Chapter 2. Red Hat AI validated models - October 2025 collection Copy linkLink copied to clipboard!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model card | Validated on |
|---|---|---|---|
| gpt-oss-120b | None |
| |
| gpt-oss-20b | None |
| |
| NVIDIA-Nemotron-Nano-9B-v2 | INT4, FP8 |
| |
| Qwen3-Coder-480B-A35B-Instruct | FP8 |
| |
| Voxtral-Mini-3B-2507 | FP8 |
| |
| whisper-large-v3-turbo | INT4 |
|
Chapter 3. Validated models on Hugging Face - September 2025 collection Copy linkLink copied to clipboard!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model card | Validated on |
|---|---|---|---|
| DeepSeek-R1-0528 | INT4 |
| |
| gemma-3n-E4B-it | FP8 |
| |
| Kimi-K2-Instruct | INT4 |
| |
| Qwen3-8B | FP8 |
|
Chapter 4. Validated models on Hugging Face - May 2025 collection Copy linkLink copied to clipboard!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model card | Validated on |
|---|---|---|---|
| gemma-2-9b-it | FP8 |
| |
| granite-3.1-8b-base | INT4 |
| |
| granite-3.1-8b-instruct | INT4, INT8, FP8 |
| |
| Llama-3.1-8B-Instruct | None |
| |
| Llama-3.1-Nemotron-70B-Instruct-HF | FP8 |
| |
| Llama-3.3-70B-Instruct | INT4, INT8, FP8 |
| |
| Llama-4-Maverick-17B-128E-Instruct | FP8 |
| |
| Llama-4-Scout-17B-16E-Instruct | INT4, FP8 |
| |
| Meta-Llama-3.1-8B-Instruct | INT4, INT8, FP8 |
| |
| Mistral-Small-24B-Instruct-2501 | INT4, INT8, FP8 |
| |
| Mistral-Small-3.1-24B-Instruct-2503 | INT4, INT8, FP8 |
| |
| Mixtral-8x7B-Instruct-v0.1 | None |
| |
| phi-4 | INT4, INT8, FP8 |
| |
| Qwen2.5-7B-Instruct | INT4, INT8, FP8 |
|
Chapter 5. Validated OCI artifact model container images Copy linkLink copied to clipboard!
The following table lists validated OCI artifact model container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.
| Model | Quantized variants | ModelCar images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3.1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 (baseline currently unavailable) |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|
Chapter 6. ModelCar container images Copy linkLink copied to clipboard!
The following table lists validated ModelCar container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.
| Model | Quantized variants | ModelCar images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3-1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 (baseline currently unavailable) |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|
Chapter 7. Validated models for use with IBM Power and IBM Spyre AI accelerators Copy linkLink copied to clipboard!
The following large language models are supported for IBM Power systems with IBM Spyre AI accelerators.
IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference Server inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.
| Model | Hugging Face model card |
|---|---|
| granite-3.3-8b-instruct | |
| granite-embedding-125m-english | |
| granite-embedding-278m-multilingual |
Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Server Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.
Updating or replacing Python packages in the Red Hat AI Inference Server Spyre container image is not supported.
Chapter 8. Validated models for use with IBM Z and IBM Spyre AI accelerators Copy linkLink copied to clipboard!
The following large language models are supported for IBM Z systems with IBM Spyre AI accelerators.
IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference Server inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.
| Model | Hugging Face model card |
|---|---|
| granite-3.3-8b-instruct |
Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Server Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.
Updating or replacing Python packages in the Red Hat AI Inference Server Spyre container image is not supported.