Chapter 2. Version 3.3.3 release notes


Red Hat AI Inference Server 3.3.3 is a maintenance release containing security fixes and enhancements.

Note

Version 3.3.2 was skipped to align version numbering across Red Hat AI Inference Server components.

The following container images are available from registry.redhat.io:

  • registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.3
  • registry.redhat.io/rhaiis/model-opt-cuda-rhel9:3.3.3
  • registry.redhat.io/rhaiis/vllm-rocm-rhel9:3.3.3
  • registry.redhat.io/rhaiis/vllm-spyre-rhel9:3.3.3

Red Hat AI Inference Server 3.3.3 packages vLLM v0.13.0 for CUDA, ROCm, and CPU container images, and vLLM v0.11.0 for Spyre container images (Power, Z, x86). These are the same vLLM versions as Red Hat AI Inference Server 3.3.1.

Red Hat AI Model Optimization Toolkit 3.3.3 packages LLM Compressor v0.9.0.3 in the model-opt container image. The Spyre container image packages LLM Compressor v0.7.1.2.

This release provides security updates. For a complete list of updates, see the following errata advisories:

2.2. Enhancements

TranslateGemma model support for CUDA AI accelerators

The google/translategemma-12b-it multimodal translation model is supported for inference serving on CUDA AI accelerators. TranslateGemma supports text and image translation across 100+ languages.

Note

The model requires the --chat-template-content-format openai flag when starting the server.

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat Documentation

Legal Notice

Theme

© 2026 Red Hat
Back to top