Chapter 3. Technology Preview features

Important

This section describes Technology Preview features in Red Hat OpenShift AI 2.25. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

IBM Spyre AI Accelerator model serving support on x86 platforms: Model serving with the IBM Spyre AI Accelerator is now available as a Technology Preview feature for x86 platforms. The IBM Spyre Operator automates installation and integrates the device plugin, secondary scheduler, and monitoring. For more information, see the IBM Spyre Operator catalog entry.

Build Generative AI Apps with Llama Stack on OpenShift AI

With this release, the Llama Stack Technology Preview feature enables Retrieval-Augmented Generation (RAG) and agentic workflows for building next-generation generative AI applications. It supports remote inference, built-in embeddings, and vector database operations. It also integrates with providers like TrustyAI’s provider for safety and Trusty AI’s LM-Eval provider for evaluation.

This preview includes tools, components, and guidance for enabling the Llama Stack Operator, interacting with the RAG Tool, and automating PDF ingestion and keyword search capabilities to enhance document discovery.

Centralized platform observability

Centralized platform observability, including metrics, traces, and built-in alerts, is available as a Technology Preview feature. This solution introduces a dedicated, pre-configured observability stack for OpenShift AI that allows cluster administrators to perform the following actions:

View platform metrics (Prometheus) and distributed traces (Tempo) for OpenShift AI components and workloads.
Manage a set of built-in alerts (alertmanager) that cover critical component health and performance issues.
Export platform and workload metrics to external 3rd party observability tools by editing the DataScienceClusterInitialization (DSCI) custom resource.
You can enable this feature by integrating with the Cluster Observability Operator, Red Hat build of OpenTelemetry, and Tempo Operator. For more information, see Monitoring and observability. For more information, see Managing observability.

Support for Llama Stack Distribution version 0.2.22

The Llama Stack Distribution now includes Llama-stack version 0.2.22 as Technology Preview. This feature brings a number of capabilities, including the following:

Automatic model registration: Self-hosted model providers, such as vLLM, are now automatically detected and registered in the Llama Stack environment.
Vector Store API improvements: The Vector Store and Vector Store Files APIs now support more efficient indexing, retrieval, and access control enforcement.
RAG evaluation providers: Support is now added for pluggable RAG evaluation frameworks, such as Ragas and BEIR, enabling performance and quality benchmarking.
Access control and telemetry: Expanded role-based access controls for telemetry and data read APIs to improve security and observability.
Bug fixes and performance updates: Improvements were made to error handling, inference stability, and client initialization across Llama Stack components.

Support for IBM Power accelerated Triton Inference Server

You can now enable Power architecture support for Triton inference server (CPU only) with Python and ONNX backend. You can deploy Triton inference server as a custom model serving runtime on IBM Power architecture as a Technology Preview feature in Red Hat OpenShift AI.

For details, see Triton Inference Server image.

Support for IBM Z accelerated Triton Inference Server

You can now enable Z architecture support for the Triton Inference Server (Telum I/Telum II) with multiple backend options, including ONNX-MLIR, Snap ML (C++), and PyTorch. The Triton Inference Server can be deployed as a custom model serving runtime on IBM Z architecture as a Technology Preview feature in Red Hat OpenShift AI.

For details, see IBM Z accelerated Triton Inference Server.

Support for Kubernetes Event-driven Autoscaling (KEDA)

OpenShift AI now supports Kubernetes Event-driven Autoscaling (KEDA) in its KServe RawDeployment mode. This Technology Preview feature enables metrics-based autoscaling for inference services, allowing for more efficient management of accelerator resources, reduced operational costs, and improved performance for your inference services.

To set up autoscaling for your inference service in KServe RawDeployment mode, you need to install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on KEDA.

For more information about this feature, see: Configuring metrics-based autoscaling.

LM-Eval model evaluation UI feature: TrustyAI now offers a user-friendly UI for LM-Eval model evaluations as Technology Preview. This feature allows you to input evaluation parameters for a given model and returns an evaluation-results page, all from the UI.

Use Guardrails Orchestrator with LlamaStack

You can now run detections using the Guardrails Orchestrator tool from TrustyAI with Llama Stack as a Technology Preview feature, using the built-in detection component. To use this feature, ensure TrustyAI is enabled, the FMS Orchestrator and detectors are set up, and KServe RawDeployment mode is in use for full compatibility if needed. There is no manual set up required. Then, in the DataScienceCluster custom resource for the Red Hat OpenShift AI Operator, set the spec.llamastackoperator.managementState field to Managed.

For more information, see Trusty AI FMS Provider on GitHub.

New Feature Store component

You can now install and manage Feature Store as a configurable component in OpenShift AI. Based on the open-source Feast project, Feature Store acts as a bridge between ML models and data, enabling consistent and scalable feature management across the ML lifecycle.

This Technology Preview release introduces the following capabilities:

Centralized feature repository for consistent feature reuse
Python SDK and CLI for programmatic and command-line interactions to define, manage, and retrieve features for ML models
Feature definition and management
Support for a wide range of data sources
Data ingestion via feature materialization
Feature retrieval for both online model inference and offline model training
Role-Based Access Control (RBAC) to protect sensitive features
Extensibility and integration with third-party data and compute providers
Scalability to meet enterprise ML needs
Searchable feature catalog
Data lineage tracking for enhanced observability
For configuration details, see Configuring Feature Store.

IBM Power and IBM Z architecture support: IBM Power (ppc64le) and IBM Z (s390x) architectures are now supported as a Technology Preview feature. Currently, you can only deploy models in KServe RawDeployment mode on these architectures.

Support for vLLM in IBM Power and IBM Z architectures: vLLM runtime templates are available for use in IBM Power and IBM Z architectures as Technology Preview.

Enable targeted deployment of workbenches to specific worker nodes in Red Hat OpenShift AI Dashboard using node selectors

Hardware profiles are now available as a Technology Preview. The hardware profiles feature enables users to target specific worker nodes for workbenches or model-serving workloads. It allows users to target specific accelerator types or CPU-only nodes.

This feature replaces the current accelerator profiles feature and container size selector field, offering a broader set of capabilities for targeting different hardware configurations. While accelerator profiles, taints, and tolerations provide some capabilities for matching workloads to hardware, they do not ensure that workloads land on specific nodes, especially if some nodes lack the appropriate taints.

The hardware profiles feature supports both accelerator and CPU-only configurations, along with node selectors, to enhance targeting capabilities for specific worker nodes. Administrators can configure hardware profiles in the settings menu. Users can select the enabled profiles using the UI for workbenches, model serving, and Data Science Pipelines where applicable.

RStudio Server workbench image

With the RStudio Server workbench image, you can access the RStudio IDE, an integrated development environment for R. The R programming language is used for statistical computing and graphics to support data analysis and predictions.

To use the RStudio Server workbench image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

Disclaimer: Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through rstudio.org and is subject to their licensing terms. You should review their licensing terms before you use this sample workbench.

CUDA - RStudio Server workbench image

With the CUDA - RStudio Server workbench image, you can access the RStudio IDE and NVIDIA CUDA Toolkit. The RStudio IDE is an integrated development environment for the R programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can enhance your work by using GPU-accelerated libraries and optimization tools.

To use the CUDA - RStudio Server workbench image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

The CUDA - RStudio Server workbench image contains NVIDIA CUDA technology. CUDA licensing information is available in the CUDA Toolkit documentation. You should review their licensing terms before you use this sample workbench.

Support for multinode deployment of very large models

Serving models over multiple graphical processing unit (GPU) nodes when using a single-model serving runtime is now available as a Technology Preview feature. Deploy your models across multiple GPU nodes to improve efficiency when deploying large models such as large language models (LLMs). For more information, see Deploying models by using multiple GPU nodes.

Chapter 3. Technology Preview features

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links