Chapter 3. New features and enhancements
This section describes new features and enhancements in Red Hat OpenShift AI 3.2.
3.1. Enhancements Copy linkLink copied to clipboard!
- PostgreSQL mandated as the production persistence layer for Llama Stack
PostgreSQL is now the only supported database for production Llama Stack deployments in OpenShift AI. The default configuration in
run.yamluses PostgreSQL for both core persistence and the RAG file provider.This enhancement ensures that production deployments use a scalable, production-ready persistence layer that meets enterprise performance and scalability requirements. SQLite remains available for local development and testing scenarios where appropriate.
- Enhanced TLS security for llm-d components
-
This release delivers a foundational security enhancement by enforcing strict TLS validation for all internal llm-d component communication. All
insecure skip TLS verifysettings have been removed from the llm-d stack. All internal services, including the Gateway, Scheduler, and vLLM backends, now use TLS certificates automatically signed by the OpenShift Service CA. Clients are configured to trust this CA, ensuring all connections are fully encrypted and validated, which prevents man-in-the-middle (MITM) attacks and enforces a zero-trust security posture.
- Model deployment wizard available from the model catalog
The OpenShift AI user interface for configuring and deploying large language models from the model catalog has been updated to use a new deployment wizard.
This streamlined interface simplifies common deployment scenarios by providing essential configuration options with sensible defaults when deploying models from the model catalog. The deployment wizard reduces setup complexity and helps users to deploy models from the model catalog more efficiently.
- Model deployment wizard available from a model registry
The OpenShift AI user interface for configuring and deploying large language models from a model registry has been updated to use a new deployment wizard.
This streamlined interface simplifies common deployment scenarios by providing essential configuration options with sensible defaults when deploying models from a model registry. The deployment wizard reduces setup complexity and helps users to deploy models from a model registry more efficiently.
3.2. New features Copy linkLink copied to clipboard!
- Support added to run Red Hat OpenShift AI on OpenShift Kubernetes Engine (OKE)
You can now install and run Red Hat OpenShift AI on OpenShift Kubernetes Engine (OKE). Red Hat provides a specific licensing exception for OpenShift AI users, making it easier to use OpenShift AI. With this feature, the dependent Operators required by Red Hat OpenShift AI can be installed on OKE.
NoteThis exception exclusively applies to Operators used to support Red Hat OpenShift AI AI workloads. Installing or using these Operators for purposes unrelated to Red Hat OpenShift AI is a violation of the OKE service agreement.
To learn more about OKE, see About OpenShift Kubernetes Engine.
- Deployment strategy selection for model serving
You can now configure the deployment strategy for model deployments from the OpenShift AI dashboard. You can choose between Rolling update and Recreate strategies.
- Rolling update (Default): Maintains availability by gradually replacing old pods with new ones.
- Recreate: Terminates the existing pod before it starts the new pod. This strategy is critical for managing large language models (LLMs) that consume significant GPU resources, because it prevents the resource contention that occurs when two instances run simultaneously during an update.
- New chat functionality in Generative AI Studio
- You can now clear the chat history and start a new conversation in the Playground by clicking the New Chat button. The chat interface clears the chat history while preserving your Playground configuration settings.
- Enhanced filtering for serving runtime selection
Red Hat OpenShift AI now includes improved filtering and distinct recommendations for selecting a serving runtime. You can choose how the serving runtime is determined by using the following options:
Auto-select the best runtime for your model based on model type, model format, and hardware profile: This option automatically selects a serving runtime if there is exactly one match. It also includes hardware profile matching based on the accelerator. For example, if you have a hardware profile with the NVIDIA GPU accelerator, the system suggests the vLLM NVIDIA GPU ServingRuntime for KServe runtime.
NoteIf a cluster administrator enables the Use distributed inference with llm-d by default when deploying generative models option as the preferred default for generative models in the administrator settings, the system suggests the Distributed inference with llm-d runtime.
- Select from a list of serving runtimes, including custom ones: This option displays all global and project-scoped serving runtime templates available to you.
- Feature Store integration with workbenches
Feature Store now completely integrates with data science projects and workbenches. Capabilities such as centrally managed role-based access control (RBAC), and feature lifecycle and lineage visibility, are now production-ready and fully supported. You can use Feature Store to standardize feature reuse and governance across projects. This allows data scientists to work within workbenches, while platform teams maintain centralized control, security, and scalability.
Feature Store now supports AI computing frameworks, Ray and Apache Spark. These tools enable scalable, distributed feature engineering for machine learning (ML) and generative AI workloads.