Chapter 1. About Mistral 3 large language models


The Mistral 3 model family includes Mistral Large 3 and the Ministral 3 series, providing enterprise-ready large language models optimized for diverse deployment scenarios from single-node edge devices to multi-GPU clusters.

All Mistral 3 models are released under the Apache 2.0 license with open weights, making them suitable for on-premises and hybrid-cloud deployments. The models are fully compatible with upstream vLLM and require no custom forks for deployment with Red Hat AI Inference Server.

Expand
Table 1.1. Mistral 3 model architectures
ArchitectureModelsCharacteristics

Sparse MoE

Mistral Large 3

Activates a subset of experts per token, is efficient at scale, and requires multi-accelerator deployments.

Dense

Ministral 3B, 8B, 14B

All parameters are active per token. Suitable for single-accelerator deployments on smaller capacity AI accelerators.

Mistral Large 3

Mistral Large 3 is designed for demanding enterprise workloads, delivering strong performance on advanced reasoning and analytical tasks. It supports multi-turn dialogue and vision–language use cases, including document understanding.

Architecturally, it follows a DeepSeekV3-style mixture of experts design, but with fewer, larger experts. It applies top-4 expert selection with softmax-based routing to balance efficiency and capability during inference. Additional distinctions from DeepSeekV3 include its use of softmax routing and Llama 4 RoPE scaling.

Ministral 3 models

Ministral 3 models are suited for edge deployments with limited GPU resources, latency-sensitive applications, mobile and embedded AI applications. Each model includes built-in vision encoders for multimodal input processing. All models support a 256K context window and include multilingual capabilities. Ministral 3 dense models are released in the following variants:

  • Ministral 3 14B is the highest capability dense model, suitable for complex reasoning tasks.
  • Ministral 3 8B has separate embedding and output layers, with balanced performance and resource usage.
  • Ministral 3 3B uses tied embeddings with shared embedding and output layers for reduced memory footprint.
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top