Home
Products
Red Hat AI
3
Choose a validated model for reliable serving
Choose a validated model for reliable serving

Validated models

Red Hat AI 3

Red Hat AI validated models

Red Hat AI Documentation Team

Abstract

Learn about the validated models that you can inference serve with Red Hat AI.

Chapter 1. About Red Hat AI validated models
Copy link

Red Hat AI validated models have been tested and verified to work correctly across supported hardware and product configurations. These models are available as Hugging Face downloads, as OCI artifact images, and as modelcar container images. Platform-specific validated models are also available for IBM Spyre on IBM Power and IBM Z systems.

Note

If you are using AI Inference Server as part of a RHEL AI deployment, use OCI artifact images.

If you are using AI Inference Server as part of a OpenShift AI deployment, use ModelCar images.

Red Hat uses GuideLLM for performance benchmarking and Language Model Evaluation Harness for accuracy evaluations.

Explore the Red Hat AI validated models collections on Hugging Face.

Important

AMD GPUs support FP8 (W8A8) and GGUF quantization variant models only. For more information, see Supported hardware.

Chapter 2. Red Hat AI validated models - February 2026
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 2.1. Red Hat AI validated models - February 2026 collection
Model	Quantized variants	Hugging Face model cards	Validated on
granite-4.0-h-small	FP8	FP8	RHAIIS 3.3 RHOAI 3.3
granite-4.0-h-tiny	FP8	FP8	RHAIIS 3.3 RHOAI 3.3
Ministral-3-14B-Instruct-2512	None	Baseline	RHAIIS 3.3 RHOAI 3.3
Phi-4-reasoning	FP8	FP8	RHAIIS 3.3 RHOAI 3.3
Qwen3-Next-80B-A3B-Instruct	INT4	INT4	RHAIIS 3.3 RHOAI 3.3
Qwen3-VL-235B-A22B-Instruct-NVFP4	None	Baseline	RHAIIS 3.3 RHOAI 3.3

Chapter 3. Red Hat AI validated models - January 2026
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 3.1. Red Hat AI validated models - January 2026 collection
Model	Quantized variants	Hugging Face model cards	Validated on
Apertus-8B-Instruct-2509	FP8	FP8	RHAIIS 3.2.5 RHOAI 3.2
Mistral-Large-3-675B-Instruct-2512	None	Baseline	RHAIIS 3.2.5 RHOAI 3.2
Mistral-Large-3-675B-Instruct-2512-NVFP4	None	Baseline	RHAIIS 3.2.5 RHOAI 3.0
NVIDIA-Nemotron-3-Nano-30B-A3B	FP8	FP8	RHAIIS 3.2.5 RHOAI 3.0

Chapter 4. NVFP4 Models
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 4.1. NVFP4 Models collection
Model	Quantized variants	Hugging Face model cards	Validated on
Mistral-Large-3-675B-Instruct-2512-NVFP4	None	Baseline	RHAIIS 3.2.5 RHOAI 3.0
Qwen3-VL-235B-A22B-Instruct-NVFP4	None	Baseline	RHAIIS 3.3 RHOAI 3.3

Chapter 5. Red Hat AI validated models - October 2025 collection
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 5.1. Red Hat AI validated models - October 2025 collection
Model	Quantized variants	Hugging Face model card	Validated on
gpt-oss-120b	None	Baseline	RHAIIS 3.2.2 RHOAI 2.25
gpt-oss-20b	None	Baseline	RHAIIS 3.2.2 RHOAI 2.25
NVIDIA-Nemotron-Nano-9B-v2	INT4, FP8	INT4 FP8	RHAIIS 3.2.2 RHOAI 2.25
Qwen3-Coder-480B-A35B-Instruct	FP8	FP8	RHAIIS 3.2.2 RHOAI 2.25
Voxtral-Mini-3B-2507	FP8	FP8	RHAIIS 3.2.2 RHOAI 2.25
whisper-large-v3-turbo	INT4	INT4	RHAIIS 3.2.2 RHOAI 2.25

Chapter 6. Validated models on Hugging Face - September 2025 collection
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 6.1. Red Hat AI validated models - September 2025 collection
Model	Quantized variants	Hugging Face model card	Validated on
DeepSeek-R1-0528	INT4	INT4	RHAIIS 3.2.1 RHOAI 2.24
gemma-3n-E4B-it	FP8	FP8	RHAIIS 3.2.1 RHOAI 2.24
Kimi-K2-Instruct	INT4	INT4	RHAIIS 3.2.1 RHOAI 2.24
Qwen3-8B	FP8	FP8	RHAIIS 3.2.1 RHOAI 2.24

Chapter 7. Validated models on Hugging Face - May 2025 collection
Copy link

The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.

Expand

Table 7.1. Red Hat AI validated models - May 2025 collection
Model	Quantized variants	Hugging Face model card	Validated on
gemma-2-9b-it	FP8	Baseline FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
granite-3.1-8b-base	INT4	INT4	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
granite-3.1-8b-instruct	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Llama-3.1-8B-Instruct	None	Baseline	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Llama-3.1-Nemotron-70B-Instruct-HF	FP8	Baseline FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Llama-3.3-70B-Instruct	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Llama-4-Maverick-17B-128E-Instruct	FP8	Baseline FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Llama-4-Scout-17B-16E-Instruct	INT4, FP8	Baseline INT4 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Meta-Llama-3.1-8B-Instruct	INT4, INT8, FP8	INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Mistral-Small-24B-Instruct-2501	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Mistral-Small-3.1-24B-Instruct-2503	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Mixtral-8x7B-Instruct-v0.1	None	Baseline	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
phi-4	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20
Qwen2.5-7B-Instruct	INT4, INT8, FP8	Baseline INT4 INT8 FP8	RHAIIS 3.0 RHELAI 1.5 RHOAI 2.20

Chapter 8. Validated OCI artifact model container images
Copy link

The following table lists validated OCI artifact model container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.

Expand

Table 8.1. Validated OCI artifact model container images
Model	Quantized variants	ModelCar images
llama-4-scout-17b-16e-instruct	INT4, FP8	Baseline: `registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct:1.5` INT4: `registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5` FP8: `registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5`
llama-4-maverick-17b-128e-instruct	FP8	Baseline: `registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct:1.5` FP8: `registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8:1.5`
mistral-small-3-1-24b-instruct-2503	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503:1.5` INT4: `registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5`
llama-3-3-70b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/llama-3-3-70b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/llama-3-3-70b-instruct-fp8-dynamic:1.5`
llama-3-1-8b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/llama-3-1-8b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/llama-3-1-8b-instruct-fp8-dynamic:1.5`
granite-3-1-8b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/granite-3-1-8b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/granite-3-1-8b-instruct-fp8-dynamic:1.5`
phi-4	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/phi-4:1.5` INT4: `registry.redhat.io/rhelai1/phi-4-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/phi-4-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/phi-4-fp8-dynamic:1.5`
qwen2-5-7b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/qwen2-5-7b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/qwen2-5-7b-instruct-fp8-dynamic:1.5`
mistral-small-24b-instruct-2501	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501:1.5` INT4: `registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-fp8-dynamic:1.5`
mixtral-8x7b-instruct-v0-1	None	Baseline: `registry.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1:1.4`
granite-3-1-8b-base	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/granite-3-1-8b-base-quantized-w4a16:1.5`
granite-3.1-8b-starter-v2	None	Baseline: `registry.redhat.io/rhelai1/granite-3.1-8b-starter-v2:1.5`
llama-3-1-nemotron-70b-instruct-hf	FP8	Baseline: `registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf:1.5` FP8: `registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5`
gemma-2-9b-it	FP8	Baseline: `registry.redhat.io/rhelai1/gemma-2-9b-it:1.5` FP8: `registry.redhat.io/rhelai1/gemma-2-9b-it-fp8:1.5`
deepseek-r1-0528	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/deepseek-r1-0528-quantized-w4a16:1.5`
qwen3-8b	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/qwen3-8b-fp8-dynamic:1.5`
kimi-k2-instruct	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/kimi-k2-instruct-quantized-w4a16:1.5`
gemma-3n-e4b-it	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/gemma-3n-e4b-it-fp8-dynamic:1.5`
gpt-oss-120b	None	Baseline: `registry.redhat.io/rhelai1/gpt-oss-120b:1.5`
gpt-oss-20b	None	Baseline: `registry.redhat.io/rhelai1/gpt-oss-20b:1.5`
qwen3-coder-480b-a35b-instruct	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/qwen3-coder-480b-a35b-instruct-fp8:1.5`
whisper-large-v3-turbo	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/whisper-large-v3-turbo-quantized-w4a16:1.5`
voxtral-mini-3b-2507	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/voxtral-mini-3b-2507-fp8-dynamic:1.5`
nvidia-nemotron-nano-9b-v2	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5`

Chapter 9. Validated Red Hat AI ModelCar container images
Copy link

Expand

Table 9.1. Validated Red Hat AI ModelCar container images
Model	Quantized variants	ModelCar images
llama-4-scout-17b-16e-instruct	INT4, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5`
llama-4-maverick-17b-128e-instruct	FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct-fp8:1.5`
mistral-small-3-1-24b-instruct-2503	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5`
llama-3-3-70b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-fp8-dynamic:1.5`
llama-3-1-8b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5`
granite-3-1-8b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-fp8-dynamic:1.5`
phi-4	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-phi-4:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-phi-4-fp8-dynamic:1.5`
qwen2-5-7b-instruct	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-fp8-dynamic:1.5`
mistral-small-24b-instruct-2501	INT4, INT8, FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501:1.5` INT4: `registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w4a16:1.5` INT8: `registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w8a8:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-fp8-dynamic:1.5`
mixtral-8x7b-instruct-v0-1	None	Baseline: `registry.redhat.io/rhelai1/modelcar-mixtral-8x7b-instruct-v0-1:1.4`
granite-3-1-8b-base	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-base-quantized-w4a16:1.5`
granite-3-1-8b-starter-v2	None	Baseline: `registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-starter-v2:1.5`
llama-3-1-nemotron-70b-instruct-hf	FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5`
gemma-2-9b-it	FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it-fp8:1.5`
deepseek-r1-0528	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/modelcar-deepseek-r1-0528-quantized-w4a16:1.5`
qwen3-8b	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5`
kimi-k2-instruct	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/modelcar-kimi-k2-instruct-quantized-w4a16:1.5`
gemma-3n-e4b-it	FP8	Baseline: `registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it:1.5` FP8: `registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it-fp8-dynamic:1.5`
gpt-oss-120b	None	Baseline: `registry.redhat.io/rhelai1/modelcar-gpt-oss-120b:1.5`
gpt-oss-20b	None	Baseline: `registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5`
qwen3-coder-480b-a35b-instruct	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/modelcar-qwen3-coder-480b-a35b-instruct-fp8:1.5`
whisper-large-v3-turbo	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhelai1/modelcar-whisper-large-v3-turbo-quantized-w4a16:1.5`
voxtral-mini-3b-2507	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/modelcar-voxtral-mini-3b-2507-fp8-dynamic:1.5`
nvidia-nemotron-nano-9b-v2	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5`
phi-4-reasoning	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhai/modelcar-phi-4-reasoning-fp8-dynamic:3.0`
qwen3-vl-235b-a22b-instruct-nvfp4	None	Baseline: `registry.redhat.io/rhai/modelcar-qwen3-vl-235b-a22b-instruct-nvfp4:3.0`
qwen3-next-80b-a3b-instruct	INT4 (baseline currently unavailable)	INT4: `registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-quantized-w4a16:3.0`
granite-4-0-h-tiny	FP8	Baseline: `registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny:3.0` FP8: `registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny-fp8-dynamic:3.0`
granite-4-0-h-small	FP8	Baseline: `registry.redhat.io/rhai/modelcar-granite-4-0-h-small:3.0` FP8: `registry.redhat.io/rhai/modelcar-granite-4-0-h-small-fp8-dynamic:3.0`
mistral-large-3-675b-instruct-2512	None	Baseline: `registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512:3.0`
mistral-large-3-675b-instruct-2512-nvfp4	None	Baseline: `registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512-nvfp4:3.0`
apertus-8b-instruct-2509	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0`
nvidia-nemotron-3-nano-30b-a3b	FP8 (baseline currently unavailable)	FP8: `registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-nano-30b-a3b-fp8:3.0`
ministral-3-14b-instruct-2512	None	Baseline: `registry.redhat.io/rhai/modelcar-ministral-3-14b-instruct-2512:3.0`

Chapter 10. Validated models for x86_64 CPU inference serving
Copy link

The following large language models have been validated for use with Red Hat AI Inference Server on x86_64 CPUs with AVX2 instruction set support. CPU inference is optimized for smaller models that can run efficiently without GPU acceleration.

Note

x86_64 CPU inference is best suited for smaller models, typically under 3 billion parameters. Performance depends on your CPU specifications, available system RAM, and model size. For larger models or production workloads requiring high throughput, consider using GPU acceleration.

Important

{feature-name} is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Expand

Table 10.1. Validated models for inferencing with x86_64 CPU
Model	Hugging Face model card	Number of parameters
TinyLlama-1.1B-Chat-v1.0	TinyLlama/TinyLlama-1.1B-Chat-v1.0	1.1B
Llama-3.2-1B-Instruct	meta-llama/Llama-3.2-1B-Instruct	1B
granite-3.2-2b-instruct	ibm-granite/granite-3.2-2b-instruct	2B
TinyLlama-1.1B-Chat-v1.0-pruned2.4	RedHatAI/TinyLlama-1.1B-Chat-v1.0-pruned2.4	1.1B (pruned)
TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds	RedHatAI/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds	1.1B (pruned + quantized)
opt-125m	facebook/opt-125m	125M
Qwen2-0.5B-Instruct-AWQ	Qwen/Qwen2-0.5B-Instruct-AWQ	0.5B

Important

Quantization formats that require GPU-specific kernels, such as Marlin format, are not supported for CPU inference. Use AWQ or GPTQ quantization formats that are compatible with CPU execution.

The following table provides general guidance for approximate system RAM requirements based on model size:

Expand

Table 10.2. Memory requirements for inference serving with x86_64 CPU
Model size	Minimum RAM	Recommended RAM
125M - 500M	8GB	16GB
500M - 1B	16GB	32GB
1B - 3B	32GB	64GB

Note

Actual memory usage depends on the model architecture, context length, and batch size. Increase the VLLM_CPU_KVCACHE_SPACE environment variable to allocate more memory for the key-value cache when using longer context lengths.

Chapter 11. Validated models for use with IBM Power and IBM Spyre AI accelerators
Copy link

The following large language models are supported for IBM Power systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference Server inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Expand

Table 11.1. IBM Granite models for use with IBM Spyre AI accelerators
Model	Hugging Face model card
granite-3.3-8b-instruct	ibm-granite/granite-3.3-8b-instruct
granite-embedding-30m-english	ibm-granite/granite-embedding-30m-english
granite-embedding-107m-multilingual	ibm-granite/granite-embedding-107m-multilingual
granite-embedding-125m-english	ibm-granite/granite-embedding-125m-english
granite-embedding-278m-multilingual	ibm-granite/granite-embedding-278m-multilingual

Expand

Table 11.2. Reranker models for use with IBM Spyre AI accelerators
Model	Hugging Face model card
bge-reranker-v2-m3	BAAI/bge-reranker-v2-m3

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Server Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Server Spyre container image is not supported.

Chapter 12. Validated models for use with IBM Z and IBM Spyre AI accelerators
Copy link

The following large language models are supported for IBM Z systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference Server inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Expand

Table 12.1. Decoder models for use with IBM Spyre AI accelerators
Model	Hugging Face model card
granite-3.3-8b-instruct	ibm-granite/granite-3.3-8b-instruct

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Server Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Server Spyre container image is not supported.

Chapter 13. Validated models for geospatial inference with TerraTorch
Copy link

The following IBM and NASA Prithvi geospatial foundation models are validated for use with AI Inference Server and TerraTorch.

Note

Prithvi-EO-2.0 models use the Vision Transformer (ViT) architecture and require TerraTorch as the model implementation backend. These models accept GeoTIFF imagery as input and return segmentation predictions.

Expand

Table 13.1. Prithvi geospatial models for use with TerraTorch
Model	Use case	Hugging Face model card	Validated on
Prithvi-EO-2.0-300M-TL-Sen1Floods11	Flood detection and mapping	Prithvi-EO-2.0-300M-TL-Sen1Floods11	RHAIIS 3.3
Prithvi-EO-2.0-300M-BurnScars	Burn scar detection	Prithvi-EO-2.0-300M-BurnScars	RHAIIS 3.3

Explore the IBM and NASA geospatial models collection on Hugging Face.

Important

Prithvi geospatial models are validated for use with NVIDIA CUDA AI accelerators only.

These models require specific vLLM server arguments to function correctly. You must include --skip-tokenizer-init, --enforce-eager, and --enable-mm-embeds when starting the inference server.

For the complete list of required server arguments, see TerraTorch configuration options for geospatial model serving and Serving TerraTorch Models with vLLM.

Legal Notice
Copy link

Copyright © Red Hat.

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat

Back to top