이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 1. Red Hat AI validated models
The Red Hat AI models are validated using open source tools. The model type you use depends on how you want to deploy the model.
If you are using AI Inference Server as part of a RHEL AI deployment, use OCI artifact images.
If you are using AI Inference Server as part of a OpenShift AI deployment, use ModelCar images.
Red Hat uses GuideLLM for performance benchmarking and Language Model Evaluation Harness for accuracy evaluations.
Explore the Red Hat AI validated models collections on Hugging Face.
AMD GPUs support FP8 (W8A8) and GGUF quantization variant models only. For more information, see Supported hardware.
1.1. Red Hat AI validated models - October 2025 collection 링크 복사링크가 클립보드에 복사되었습니다!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model cards | Validated on |
|---|---|---|---|
| gpt-oss-120b | None |
| |
| gpt-oss-20b | None |
| |
| NVIDIA-Nemotron-Nano-9B-v2 | INT4, FP8 |
| |
| Qwen3-Coder-480B-A35B-Instruct | FP8 |
| |
| Voxtral-Mini-3B-2507 | FP8 |
| |
| whisper-large-v3-turbo | INT4 |
|
1.2. Validated models on Hugging Face - September 2025 collection 링크 복사링크가 클립보드에 복사되었습니다!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model cards | Validated on |
|---|---|---|---|
| DeepSeek-R1-0528 | INT4 |
| |
| gemma-3n-E4B-it | FP8 |
| |
| Kimi-K2-Instruct | INT4 |
| |
| Qwen3-8B | FP8 |
|
1.3. Validated models on Hugging Face - May 2025 collection 링크 복사링크가 클립보드에 복사되었습니다!
The following models, available from RedHat AI on Hugging Face, are validated for use with Red Hat AI Inference Server.
| Model | Quantized variants | Hugging Face model cards | Validated on |
|---|---|---|---|
| gemma-2-9b-it | FP8 |
| |
| granite-3.1-8b-base | INT4 |
| |
| granite-3.1-8b-instruct | INT4, INT8, FP8 |
| |
| Llama-3.1-8B-Instruct | None |
| |
| Llama-3.1-Nemotron-70B-Instruct-HF | FP8 |
| |
| Llama-3.3-70B-Instruct | INT4, INT8, FP8 |
| |
| Llama-4-Maverick-17B-128E-Instruct | FP8 |
| |
| Llama-4-Scout-17B-16E-Instruct | INT4, FP8 |
| |
| Meta-Llama-3.1-8B-Instruct | INT4, INT8, FP8 |
| |
| Mistral-Small-24B-Instruct-2501 | INT4, INT8, FP8 |
| |
| Mistral-Small-3.1-24B-Instruct-2503 | INT4, INT8, FP8 |
| |
| Mixtral-8x7B-Instruct-v0.1 | None |
| |
| phi-4 | INT4, INT8, FP8 |
| |
| Qwen2.5-7B-Instruct | INT4, INT8, FP8 |
|
1.4. Validated OCI artifact model container images 링크 복사링크가 클립보드에 복사되었습니다!
| Model | Quantized variants | ModelCar images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3.1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 (baseline currently unavailable) |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|
1.5. ModelCar container images 링크 복사링크가 클립보드에 복사되었습니다!
| Model | Quantized variants | ModelCar images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3-1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 (baseline currently unavailable) |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|