Chapter 2. New features and enhancements


This section describes new features and enhancements in Red Hat OpenShift AI 3.4 GA.

2.1. New features

2.1.1. 3.4 GA new features

Migration guide available for transitioning from vLLM-based InferenceService to LLMInferenceService
Platform Operators deploying Distributed Inference with llm-d on OpenShift AI can follow a step-by-step guide covering LLMInferenceServiceConfig, YAML examples, and migration from vLLM-based InferenceService deployments. The guide explains how LLMInferenceServiceConfig replaces custom ServingRuntime definitions. For more information, see https://access.redhat.com/articles/7141739.
Prometheus metrics for Distributed Inference with llm-d
You can now monitor llm-d distributed inference deployments using documented Prometheus metrics and PromQL query examples. These metrics are exported by all llm-d components, including the Endpoint Picker (EPP), vLLM engine pods, and prefix cache, along with example queries for building custom dashboards and configuring ServiceMonitors for OpenShift User Workload Monitoring.
MLFlow SDK pre-installed in workbench and runtime images
The MLFlow SDK is now pre-installed and included in the datascience, tensorflow (CUDA & ROCm), pytorch (CUDA & ROCm), and codeserver workbench and runtime images.
MLflow Operator is now a managed component in the DataScienceCluster CR
Starting with Red Hat OpenShift AI 3.4, the MLflow Operator is a managed component in the DataScienceCluster custom resource (CR). You can enable the mlflowoperator component by setting managementState to Managed in the DataScienceCluster CR. MLflow availability in the dashboard is now determined by the component state, and the deprecated mlflow dashboard feature flag is no longer required. For more information, see Enable the MLflow operator component.
MLServer for KServe now generally available
MLServer is now Generally Available (GA) in the OpenShift AI 3.4 release, providing a production-ready and fully supported model serving solution. As part of this milestone, MLServer offers out-of-the-box support for four widely used model formats—Scikit-learn (sklearn), LightGBM, XGBoost, and ONNX—enabling users to seamlessly deploy and serve models without additional configuration. This GA release enhances stability, reliability, and integration with the RHOAI ecosystem, allowing users to confidently run inference workloads in production and accelerate machine learning deployment workflows.
NeMo Guardrails to enable AI safety

NeMo Guardrails, introduced in Red Hat OpenShift AI 3.3 as a Technology Preview, is fully supported with this release. You can add guardrails and safety controls to your deployed models. NeMo Guardrails provides a framework for controlling conversations with large language models, enabling you to define a variety of rails, such as sensitive data detection, content filtering, or custom validation rules. NeMo Guardrails introduces the following capabilities:

  • /v1/guardrails/checks endpoint for standalone querying of guardrail policies
  • Full OpenAI compatibility, with a new /v1/models/ endpoint
  • New regex rails for regex-based guardrail logic
  • Support for multiple replicas of the NeMo Guardrails pod to improve scalability
  • Out-of-the-box OpenTelemetry support
  • Automatic redeployment after configuration changes, for zero-downtime guardrail tuning
Models-as-a-Service now Generally Available

Models-as-a-Service (MaaS) is now Generally Available in Red Hat OpenShift AI 3.4. Previously introduced as a Technology Preview feature in version 3.3, MaaS provides an enterprise platform for centralized governance, consumption tracking, and self-service access to large language models. MaaS addresses resource consumption and governance challenges by exposing models through managed API endpoints with subscription-based controls. Administrators can define subscriptions that grant groups access to specific models with configurable token limits, while users can generate and manage their own API keys for programmatic access.

Key capabilities include:

  • Subscription-based model access with configurable token quotas and rate limiting
  • Self-service API key generation and management
  • Centralized authentication and authorization policies (external OIDC authentication available as Technology Preview)
  • Support for distributed inference with llm-d (vLLM runtime support available as Technology Preview)
  • Usage tracking and showback reporting through an observability dashboard (Technology Preview)
  • Routing to external model providers such as OpenAI or Anthropic (Technology Preview)

For more information, see Governing LLM access with Models-as-a-Service.

MaaS subscription model redesign

The Models-as-a-Service subscription model has been redesigned to replace the tier-based model introduced in version 3.3. The new subscription model provides administrators with flexible access control and resource management for large language models, while enabling data scientists to select from available subscriptions when accessing models.

Key enhancements include:

  • Priority-based subscription assignment: As a cluster administrator, you can assign priority levels to subscriptions. As a user, when you belong to multiple groups with different subscriptions, the system automatically assigns you to the highest-priority subscription by default, while allowing you to manually select from your available subscriptions when generating API keys.
  • Group-based access control: As a cluster administrator, you can define subscriptions that grant groups access to specific models. Subscriptions support integration with OpenShift groups, OIDC group claims, and API key group snapshots.
  • Configurable token quotas: As a cluster administrator, you can define distinct token limits per model for each subscription, enabling cost control and resource allocation aligned with organizational policies. As a user, your token consumption is tracked against your active subscription’s limits.
  • Authorization policy integration: Subscriptions work in combination with authorization policies to control API gateway access and token consumption limits.

The subscription model redesign enables organizations to enforce consumption policies across teams while maintaining self-service access for data scientists and developers.

For more information, see Governing LLM access with Models-as-a-Service.

Self-service API key management for Models-as-a-Service

You can now create and manage your own API keys for programmatic access to large language models through Models-as-a-Service. This self-service capability streamlines access to large language models while maintaining centralized governance through subscriptions and authorization policies.

Key capabilities include:

  • User-managed API key lifecycle: Create permanent API keys with configurable expiration dates (1-365 days or never expire), view active keys, and revoke keys when no longer needed
  • Temporary API key generation: Generate short-lived API keys directly from the model endpoint dialog for quick testing and prototyping
  • Subscription-scoped authentication: API keys are scoped to specific subscriptions, inheriting the subscription’s model access and token limits
  • Group membership snapshot: User group membership is captured at API key creation time, ensuring consistent access control even when group assignments change

Self-service API key management empowers you to access models programmatically through OpenAI-compatible APIs without administrative bottlenecks, while administrators retain control through subscription-based governance.

For more information, see Governing LLM access with Models-as-a-Service.

Support for Llama Stack and KubeRay on IBM Power
Red Hat OpenShift AI 3.4 GA introduces official support for both Llama Stack and KubeRay on the IBM Power architecture.
OCI-compliant storage layer for model registry

You can now use the OpenShift AI dashboard to register a model from an S3-compatible source or URI, transform it into an OCI ModelCar image, and store it in an OCI registry. The ModelCar target format enables fast deployment with KServe.

The model transfer job runs as a background Kubernetes Job that you can monitor from the dashboard. This feature provides the following capabilities:

  • Register and store models from object storage (S3, MinIO) or URLs in a single operation.
  • Models are automatically converted to ModelCar OCI images, ensuring compatibility with KServe ModelCar for model serving.
  • Track model transfer jobs in real-time with detailed status information, Kubernetes event logs, and automatic polling.
  • Retry failed jobs or delete completed jobs directly from the dashboard.
  • ConfigMaps and Secrets are automatically garbage-collected when jobs are deleted.
MLServer ServingRuntime for KServe is now generally available

The MLServer serving runtime for KServe is now generally available in Red Hat OpenShift AI. You can use this runtime to deploy models trained on structured data, such as classical machine learning models. You can deploy models directly in their native format, simplifying the deployment process.

Supported model frameworks include: * Scikit-learn * XGBoost * LightGBM * ONNX For models with well-known file names, MLServer automatically configures all required environment variables during deployment through the Deploy a model wizard.

For more information, see Deploying models using the MLServer runtime.

2.1.2. 3.4 EA2 new features

Multi-architecture support for the model catalog

The model catalog includes support for IBM Power (ppc64le) architecture. With this enhancement, you can discover and deploy models directly from the dashboard. Support is available for the following validated models:

  • registry.redhat.io/rhai/modelcar-granite-3-3-8b-instruct
  • registry.redhat.io/rhai/modelcar-granite-4-0-h-small:3.0
  • registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny:3.0
Multi-architecture support for the model catalog

The model catalog now includes support for the IBM Z architecture. This enhancement enables users on IBM Z platforms to discover and deploy models directly from the OpenShift AI dashboard. Currently, support is available for the following model:

  • registry.redhat.io/rhai/modelcar-granite-3-3-8b-instruct
Just-In-Time Checkpointing and S3 Storage for Kubeflow Trainer

Kubeflow Trainer now provides Just-In-Time (JIT) and periodic checkpointing for distributed training jobs on OpenShift AI. This enhancement automatically saves the training state, including model weights, optimizer state, and training step at regular intervals and immediately before interruptions such as preemption, eviction, or maintenance. Interrupted jobs automatically resume from the latest valid checkpoint, significantly reducing wasted GPU compute and improving overall training efficiency.

Checkpoints can be stored on PersistentVolumeClaims (PVCs) or S3-compatible object storage. With S3, checkpoints are uploaded in the background without pausing training, enabling low-overhead, continuous protection of progress. S3-backed storage also provides a cost-efficient, portable alternative to PVCs, allowing checkpoints to be retained, shared, and reused across clusters.

2.1.3. 3.4 EA1 new features

Model deployments are not visible under the model registry deployments tab on IBM Power (ppc64le) in RHOAI 3.4-EA1.

Workbench and runtime images default to Red Hat Python index
Workbench and runtime images default to the Red Hat Python index. When you install or update Python packages, packages are pulled from the Red Hat Python index rather than PyPI. This provides you with Red Hat built and supported Python packages.
Garak evaluation provider available in Llama Stack distribution
The Garak evaluation provider is available in the Llama Stack distribution. Garak provides security scanning capabilities for large language models to help identify potential vulnerabilities and safety issues. The provider is available in two versions: an inline version that runs scans in the same process as the Llama Stack server, and a remote version that runs scans by using Kubeflow Pipelines.
PostgreSQL database support for Model Registry
You can configure a PostgreSQL database as the backend for Model Registry from the OpenShift AI dashboard.
Default database solution for Model Registry

Model Registry includes a default database solution for testing. Use this solution to start using Model Registry without configuring an external database.

Note

The default database is not intended for production workloads.

2.2. Enhancements

2.2.1. 3.4 GA enhancements

vLLM uvicorn access logs are disabled by default in Distributed Inference with llm-d
vLLM uvicorn access logs are disabled by default in LLMInferenceServiceConfig, including logs generated by router-scheduler /metrics endpoint polling. This reduces excessive logging caused by the EndpointPicker scraping metrics scraping every 200 milliseconds. Operators who need access logs for debugging can re-enable them explicitly.

2.2.2. 3.4 EA2 enhancements

Simplified configuration for Distributed Inference with llm-d scheduler settings
Configure Distributed Inference with llm-d scheduler settings using the endpointPickerConfig field in the LLMInferenceService specification. You can specify the configuration inline or reference a ConfigMap. This approach replaces the previous method that required making extensive specifications in the EndpointPicker configuration in the scheduler’s --configText argument.
Configure vLLM runtime arguments using Kubernetes container args field
You can configure vLLM runtime arguments using the standard Kubernetes container args field in LLMInferenceService resources. User-specified arguments are merged with system defaults, allowing you to add new arguments or override specific defaults without replacing the entire argument list.

The previous VLLM_ADDITIONAL_ARGS environment variable method continues to work for backward compatibility.

2.2.3. 3.4 EA1 enhancements

Hybrid search support for Qdrant remote vector database provider
Vector Store Search supports hybrid and keyword search for the Qdrant Vector IO provider.
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat Documentation

Legal Notice

Theme

© 2026 Red Hat
Back to top