Chapter 1. About Mistral 3 large language models
The Mistral 3 model family includes Mistral Large 3 and the Ministral 3 series, providing enterprise-ready large language models optimized for diverse deployment scenarios from single-node edge devices to multi-GPU clusters.
All Mistral 3 models are released under the Apache 2.0 license with open weights, making them suitable for on-premises and hybrid-cloud deployments. The models are fully compatible with upstream vLLM and require no custom forks for deployment with Red Hat AI Inference Server.
| Architecture | Models | Characteristics |
|---|---|---|
| Sparse MoE | Mistral Large 3 | Activates a subset of experts per token, is efficient at scale, and requires multi-accelerator deployments. |
| Dense | Ministral 3B, 8B, 14B | All parameters are active per token. Suitable for single-accelerator deployments on smaller capacity AI accelerators. |
- Mistral Large 3
Mistral Large 3 is designed for demanding enterprise workloads, delivering strong performance on advanced reasoning and analytical tasks. It supports multi-turn dialogue and vision–language use cases, including document understanding.
Architecturally, it follows a DeepSeekV3-style mixture of experts design, but with fewer, larger experts. It applies top-4 expert selection with softmax-based routing to balance efficiency and capability during inference. Additional distinctions from DeepSeekV3 include its use of softmax routing and Llama 4 RoPE scaling.
- Ministral 3 models
Ministral 3 models are suited for edge deployments with limited GPU resources, latency-sensitive applications, mobile and embedded AI applications. Each model includes built-in vision encoders for multimodal input processing. All models support a 256K context window and include multilingual capabilities. Ministral 3 dense models are released in the following variants:
- Ministral 3 14B is the highest capability dense model, suitable for complex reasoning tasks.
- Ministral 3 8B has separate embedding and output layers, with balanced performance and resource usage.
- Ministral 3 3B uses tied embeddings with shared embedding and output layers for reduced memory footprint.