Chapter 3. Supported model compression workflows


LLM Compressor supports post-training quantization, a conversion technique that reduces model size and improves CPU and hardware accelerator performance latency, without degrading model accuracy. A streamlined API applies quantization or sparsity based on a data set that you provide.

The following advanced model types and deployment workflows are supported:

  • Multimodal models: Includes vision-language models
  • Mixture of experts (MoE) models: Supports models like DeepSeek and Mixtral, with support for calibration including NVFP4 quantization
  • Large model support: Uses the Hugging Face accelerate library for multi-GPU and CPU offloading
  • Multiple quantization schemes applied to a single model: Support for non-uniform quantization, such as combining NVFP4 and FP8 quantization

All workflows are Hugging Face–compatible, enabling models to be quantized, compressed, and deployed with vLLM for efficient inference. LLM Compressor supports several compression algorithms:

  • AWQ: Weight only INT4 quantization
  • GPTQ: Weight only INT4 quantization
  • FP8: Dynamic per-token quantization and DeepSeekV3-style block quantization
  • SparseGPT: Post-training sparsity
  • SmoothQuant: Activation quantization
  • QuIP transforms: Weight and activation quantization
  • SpinQuant transforms: Weight and activation quantization

Each of these compression methods computes optimal scales and zero-points for weights and activations. Optimized scales can be per tensor, channel, group, or token. The final result is a compressed model saved with all its applied quantization parameters.

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top