Chapter 1. Version 3.4 release notes
Red Hat Enterprise Linux AI is a generative AI inference platform for Linux environments that uses Red Hat AI Inference Server for running and optimizing models, and includes Red Hat AI Model Optimization Toolkit for model quantization, sparsity, and general compression for supported AI accelerators. Red Hat AI Model Optimization Toolkit has native Hugging Face and vLLM support. You can seamlessly integrate optimized models with deployment pipelines for faster, cost-saving inference at scale, powered by the compressed-tensors model format.
Red Hat Enterprise Linux AI 3.4.0-ea.1 is an Early Access release. Early Access releases are not supported by Red Hat in any way and are not functionally complete or production-ready. Do not use Early Access releases for production or business-critical workloads. Use Early Access releases to test upcoming product features in advance of their possible inclusion in a Red Hat product offering, and to test functionality and provide feedback during the development process. These features might not have any documentation, are subject to change or removal at any time, and testing is limited. Red Hat might provide ways to submit feedback on Early Access features without an associated SLA.
Red Hat Enterprise Linux AI is packaged as a bootc container image for easy deployment on a Linux server appliance with NVIDIA CUDA or AMD ROCm AI accelerators installed. The following container images are available as early access releases from registry.redhat.io:
-
registry.redhat.io/rhelai-early-access/bootc-cuda-rhel9:3.4.0-ea.1 -
registry.redhat.io/rhelai-early-access/bootc-rocm-rhel9:3.4.0-ea.1
There is no direct upgrade path from Red Hat Enterprise Linux AI 1.5 to Red Hat Enterprise Linux AI 3.0. You can upgrade from Red Hat Enterprise Linux AI 3.0 to 3.4 and all versions in-between.
The registry.redhat.io/rhelai-early-access/bootc-rocm-rhel9:3.4.0-ea.1 image does not include Red Hat AI Model Optimization Toolkit, which is not supported for AMD ROCm AI accelerators.
1.1. New features Copy linkLink copied to clipboard!
Red Hat Enterprise Linux AI 3.4 packages Red Hat AI Inference Server 3.4, which includes the following highlights:
- Upgraded vLLM to v0.14.1
- Red Hat AI Inference Server 3.4 packages the upstream vLLM v0.14.1 release with asynchronous scheduling enabled by default, a new gRPC server entrypoint, auto-context length fitting, and security fixes including token leak prevention in crash logs.
- New model support
- Red Hat AI Inference Server 3.4 adds support for Grok-2, Mistral 3, MiMo-V2-Flash, Nemotron Parse 1.1, and various other model architectures. LoRA multimodal support has been expanded for LLaVA, BLIP2, PaliGemma, Pixtral, and GLM4-V models. Tool calling enhancements include FunctionGemma and GLM-4.7 parsers.
- Performance improvements
- Asynchronous scheduling now overlaps engine core scheduling with GPU execution, improving throughput without manual configuration. CUTLASS MoE optimizations deliver up to 5.3% throughput gain and up to 10.8% time to first token improvement. Fused RoPE and MLA KV-cache write optimization improves DeepSeek-style model performance.
- New AI accelerator support
- Red Hat AI Inference Server 3.4 adds RTX PRO 4500 Blackwell Server Edition GPU support for NVIDIA, AITER RMSNorm fusion for AMD, and chunked prefill and prefix caching for IBM Spyre accelerators. CPU backend adds support for head sizes 80 and 112.
- Quantization advances
- Marlin support extends to Turing (sm75) architecture. New Quark int4-fp8 w4a8 MoE support, MXFP4 W4A16 support for dense models, and ModelOpt FP8 variants are now available.
- Large-scale serving updates
- Extended Dual-Batch Overlap (XBO) implementation, NVIDIA Inference Xfer Library (NIXL) asymmetric tensor parallelism, and LMCache KV cache registration improve large-scale serving capabilities.
1.2. Known issues Copy linkLink copied to clipboard!
There are no known issues for Red Hat Enterprise Linux AI 3.4.