LLM Compressor
Red Hat AI Inference Server 3.0
Compressing large language models with the LLM Compressor library
Abstract
Describes the LLM Compressor library and how you can use it to optimize and compress large language models before inferencing.