Chapter 2. New features and enhancements

This section describes new features and enhancements in Red Hat OpenShift AI 3.4 GA.

2.1. New features
Copy link

2.1.1. 3.4 GA new features
Copy link

Migration guide available for transitioning from vLLM-based InferenceService to LLMInferenceService: Platform Operators deploying Distributed Inference with llm-d on OpenShift AI can follow a step-by-step guide covering LLMInferenceServiceConfig, YAML examples, and migration from vLLM-based InferenceService deployments. The guide explains how LLMInferenceServiceConfig replaces custom ServingRuntime definitions. For more information, see https://access.redhat.com/articles/7141739.

Prometheus metrics for Distributed Inference with llm-d: You can now monitor llm-d distributed inference deployments using documented Prometheus metrics and PromQL query examples. These metrics are exported by all llm-d components, including the Endpoint Picker (EPP), vLLM engine pods, and prefix cache, along with example queries for building custom dashboards and configuring ServiceMonitors for OpenShift User Workload Monitoring.

MLFlow SDK pre-installed in workbench and runtime images: The MLFlow SDK is now pre-installed and included in the datascience, tensorflow (CUDA & ROCm), pytorch (CUDA & ROCm), and codeserver workbench and runtime images.

MLflow Operator is now a managed component in the DataScienceCluster CR: Starting with Red Hat OpenShift AI 3.4, the MLflow Operator is a managed component in the DataScienceCluster custom resource (CR). You can enable the mlflowoperator component by setting managementState to Managed in the DataScienceCluster CR. MLflow availability in the dashboard is now determined by the component state, and the deprecated mlflow dashboard feature flag is no longer required. For more information, see Enable the MLflow operator component.

NeMo Guardrails to enable AI safety

NeMo Guardrails, introduced in Red Hat OpenShift AI 3.3 as a Technology Preview, is fully supported with this release. You can add guardrails and safety controls to your deployed models. NeMo Guardrails provides a framework for controlling conversations with large language models, enabling you to define a variety of rails, such as sensitive data detection, content filtering, or custom validation rules. NeMo Guardrails introduces the following capabilities:

/v1/guardrails/checks endpoint for standalone querying of guardrail policies
Full OpenAI compatibility, with a new /v1/models/ endpoint
New regex rails for regex-based guardrail logic
Support for multiple replicas of the NeMo Guardrails pod to improve scalability
Out-of-the-box OpenTelemetry support
Automatic redeployment after configuration changes, for zero-downtime guardrail tuning

Models-as-a-Service now Generally Available

Models-as-a-Service (MaaS) is now Generally Available in Red Hat OpenShift AI 3.4. Previously introduced as a Technology Preview feature in version 3.3, MaaS provides an enterprise platform for centralized governance, consumption tracking, and self-service access to large language models. MaaS addresses resource consumption and governance challenges by exposing models through managed API endpoints with subscription-based controls. Administrators can define subscriptions that grant groups access to specific models with configurable token limits, while users can generate and manage their own API keys for programmatic access.

Key capabilities include:

Subscription-based model access with configurable token quotas and rate limiting
Self-service API key generation and management
Centralized authentication and authorization policies (external OIDC authentication available as Technology Preview)
Support for distributed inference with llm-d (vLLM runtime support available as Technology Preview)
Usage tracking and showback reporting through an observability dashboard (Technology Preview)
Routing to external model providers such as OpenAI or Anthropic (Technology Preview)

For more information, see Governing LLM access with Models-as-a-Service.

The Models-as-a-Service subscription model redesign

The Models-as-a-Service (MaaS) subscription model has been redesigned to replace the tier-based model introduced in version 3.3. The new subscription model provides administrators with flexible access control and resource management for large language models, while enabling data scientists to select from available subscriptions when accessing models.

Key enhancements include:

Priority-based subscription assignment: As a cluster administrator, you can assign priority levels to subscriptions. As a user, when you belong to multiple groups with different subscriptions, the system automatically assigns you to the highest-priority subscription by default, while allowing you to manually select from your available subscriptions when generating API keys.
Group-based access control: As a cluster administrator, you can define subscriptions that grant groups access to specific models. Subscriptions support integration with OpenShift groups, OIDC group claims, and API key group snapshots.
Configurable token quotas: As a cluster administrator, you can define distinct token limits per model for each subscription, enabling cost control and resource allocation aligned with organizational policies. As a user, your token consumption is tracked against your active subscription’s limits.
Authorization policy integration: Subscriptions work in combination with authorization policies to control API gateway access and token consumption limits.

The subscription model redesign enables organizations to enforce consumption policies across teams while maintaining self-service access for data scientists and developers.

For more information, see Governing LLM access with Models-as-a-Service.

Self-service API key management for Models-as-a-Service

You can now create and manage your own API keys for programmatic access to large language models through Models-as-a-Service. This self-service capability streamlines access to large language models while maintaining centralized governance through subscriptions and authorization policies.

Key capabilities include:

User-managed API key lifecycle: Create permanent API keys with configurable expiration dates, view active keys, and revoke keys when no longer needed
Temporary API key generation: Generate short-lived API keys directly from the model endpoint dialog for quick testing and prototyping
Subscription-scoped authentication: API keys are scoped to specific subscriptions, inheriting the subscription’s model access and token limits
Group membership snapshot: User group membership is captured at API key creation time, ensuring consistent access control even when group assignments change

Self-service API key management empowers you to access models programmatically through OpenAI-compatible APIs without administrative bottlenecks, while administrators retain control through subscription-based governance.

For more information, see Governing LLM access with Models-as-a-Service.

Support for Llama Stack and KubeRay on IBM Power: Red Hat OpenShift AI 3.4 GA introduces official support for both Llama Stack and KubeRay on the IBM Power architecture.

OCI-compliant storage layer for model registry

You can now use the OpenShift AI dashboard to register a model from an S3-compatible source or URI, transform it into an OCI ModelCar image, and store it in an OCI registry. The ModelCar target format enables fast deployment with KServe.

The model transfer job runs as a background Kubernetes Job that you can monitor from the dashboard. This feature provides the following capabilities:

Register and store models from object storage (S3, MinIO) or URLs in a single operation.
Models are automatically converted to ModelCar OCI images, ensuring compatibility with KServe ModelCar for model serving.
Track model transfer jobs in real-time with detailed status information, Kubernetes event logs, and automatic polling.
Retry failed jobs or delete completed jobs directly from the dashboard.
ConfigMaps and Secrets are automatically garbage-collected when jobs are deleted.

MLServer ServingRuntime for KServe is now generally available

The MLServer ServingRuntime for KServe is now generally available in Red Hat OpenShift AI. You can use this runtime to deploy models trained on structured data, such as classical machine learning models. You can deploy models directly in their native format, simplifying the deployment process.

Supported model frameworks include: * Scikit-learn * XGBoost * LightGBM * ONNX

For models with well-known file names, MLServer automatically configures all required environment variables during deployment through the Deploy a model wizard.

For more information, see Deploying models using the MLServer runtime.

2.1.2. 3.4 EA2 new features
Copy link

Multi-architecture support for the model catalog

The model catalog includes support for IBM Power (ppc64le) architecture. With this enhancement, you can discover and deploy models directly from the dashboard. Support is available for the following validated models:

registry.redhat.io/rhai/modelcar-granite-3-3-8b-instruct
registry.redhat.io/rhai/modelcar-granite-4-0-h-small:3.0
registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny:3.0

Multi-architecture support for the model catalog

The model catalog now includes support for the IBM Z architecture. This enhancement enables users on IBM Z platforms to discover and deploy models directly from the OpenShift AI dashboard. Currently, support is available for the following model:

registry.redhat.io/rhai/modelcar-granite-3-3-8b-instruct

Just-In-Time Checkpointing and S3 Storage for Kubeflow Trainer

Kubeflow Trainer now provides Just-In-Time (JIT) and periodic checkpointing for distributed training jobs on OpenShift AI. This enhancement automatically saves the training state, including model weights, optimizer state, and training step at regular intervals and immediately before interruptions such as preemption, eviction, or maintenance. Interrupted jobs automatically resume from the latest valid checkpoint, significantly reducing wasted GPU compute and improving overall training efficiency.

Checkpoints can be stored on PersistentVolumeClaims (PVCs) or S3-compatible object storage. With S3, checkpoints are uploaded in the background without pausing training, enabling low-overhead, continuous protection of progress. S3-backed storage also provides a cost-efficient, portable alternative to PVCs, allowing checkpoints to be retained, shared, and reused across clusters.

2.1.3. 3.4 EA1 new features
Copy link

Model deployments are not visible under the model registry deployments tab on IBM Power (ppc64le) in RHOAI 3.4-EA1.

Workbench and runtime images default to Red Hat Python index: Workbench and runtime images default to the Red Hat Python index. When you install or update Python packages, packages are pulled from the Red Hat Python index rather than PyPI. This provides you with Red Hat built and supported Python packages.

Garak evaluation provider available in Llama Stack distribution: The Garak evaluation provider is available in the Llama Stack distribution. Garak provides security scanning capabilities for large language models to help identify potential vulnerabilities and safety issues. The provider is available in two versions: an inline version that runs scans in the same process as the Llama Stack server, and a remote version that runs scans by using Kubeflow Pipelines.

PostgreSQL database support for Model Registry: You can configure a PostgreSQL database as the backend for Model Registry from the OpenShift AI dashboard.

Default database solution for Model Registry: Model Registry includes a default database solution for testing. Use this solution to start using Model Registry without configuring an external database.
Note
The default database is not intended for production workloads.

2.2. Enhancements
Copy link

2.2.1. 3.4 GA enhancements
Copy link

vLLM uvicorn access logs are disabled by default in Distributed Inference with llm-d: vLLM uvicorn access logs are disabled by default in LLMInferenceServiceConfig, including logs generated by router-scheduler /metrics endpoint polling. This reduces excessive logging caused by the EndpointPicker scraping metrics scraping every 200 milliseconds. Operators who need access logs for debugging can re-enable them explicitly.

2.2.2. 3.4 EA2 enhancements
Copy link

Simplified configuration for Distributed Inference with llm-d scheduler settings: Configure Distributed Inference with llm-d scheduler settings using the endpointPickerConfig field in the LLMInferenceService specification. You can specify the configuration inline or reference a ConfigMap. This approach replaces the previous method that required making extensive specifications in the EndpointPicker configuration in the scheduler’s --configText argument.

Configure vLLM runtime arguments using Kubernetes container args field: You can configure vLLM runtime arguments using the standard Kubernetes container args field in LLMInferenceService resources. User-specified arguments are merged with system defaults, allowing you to add new arguments or override specific defaults without replacing the entire argument list.

The previous VLLM_ADDITIONAL_ARGS environment variable method continues to work for backward compatibility.

2.2.3. 3.4 EA1 enhancements
Copy link

Hybrid search support for Qdrant remote vector database provider: Vector Store Search supports hybrid and keyword search for the Qdrant Vector IO provider.

Chapter 2. New features and enhancements

2.1. New features
Copy link

2.1.1. 3.4 GA new features
Copy link

2.1.2. 3.4 EA2 new features
Copy link

2.1.3. 3.4 EA1 new features
Copy link

2.2. Enhancements
Copy link

2.2.1. 3.4 GA enhancements
Copy link

2.2.2. 3.4 EA2 enhancements
Copy link

2.2.3. 3.4 EA1 enhancements
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. New features and enhancements

2.1. New featuresCopy linkLink copied to clipboard!

2.1.1. 3.4 GA new featuresCopy linkLink copied to clipboard!

2.1.2. 3.4 EA2 new featuresCopy linkLink copied to clipboard!

2.1.3. 3.4 EA1 new featuresCopy linkLink copied to clipboard!

2.2. EnhancementsCopy linkLink copied to clipboard!

2.2.1. 3.4 GA enhancementsCopy linkLink copied to clipboard!

2.2.2. 3.4 EA2 enhancementsCopy linkLink copied to clipboard!

2.2.3. 3.4 EA1 enhancementsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. New features
Copy link

2.1.1. 3.4 GA new features
Copy link

2.1.2. 3.4 EA2 new features
Copy link

2.1.3. 3.4 EA1 new features
Copy link

2.2. Enhancements
Copy link

2.2.1. 3.4 GA enhancements
Copy link

2.2.2. 3.4 EA2 enhancements
Copy link

2.2.3. 3.4 EA1 enhancements
Copy link