Questo contenuto non è disponibile nella lingua selezionata.

Chapter 3. Technology Preview features

Important

This section describes Technology Preview features in Red Hat OpenShift AI 2.23. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Centralized platform metrics and alerts

Centralized platform metrics are now available as a Technology Preview feature in OpenShift AI. This feature enables integration with the Cluster Observability Operator (COO), Red Hat build of OpenTelemetry, and Red Hat build of Tempo, and provides optional out-of-the-box observability configurations for OpenShift AI.

This feature also introduces a dedicated observability stack for OpenShift AI. Future releases will collect infrastructure and customer workload signals in the dedicated observability stack.

Enhanced Kueue integration for workloads

Enhanced Kueue integration is now available as a Technology Preview feature in OpenShift AI. Kueue is a job queuing system that provides resource-aware scheduling for batch workloads. This enhancement uses the Red Hat build of Kueue Operator to extend Kueue capabilities beyond Ray distributed training workloads to also include PyTorchJobs, workbenches, and model serving (InferenceService). It enables quota-based scheduling and improves GPU utilization across workload types in OpenShift AI.

When using Kueue integration, workloads in Kueue-managed namespaces (labeled with kueue.openshift.io/managed=true) must include the kueue.x-k8s.io/queue-name label to specify their target LocalQueue. Different from previous enforcement through a Validating Admission Policy, this requirement is now enforced by validation webhooks. The following workload types are supported:

Notebooks
InferenceServices
PyTorchJobs
RayJobs and RayCluster

Support for IBM Power accelerated Triton Inference Server

You can now enable Power architecture support for Triton inference server (CPU only) with Python and ONNX backend. You can deploy Triton inference server as a custom model serving runtime on IBM Power architecture as a Technology Preview in Red Hat OpenShift AI.

For details, see Triton Inference Server image.

Support for Kubernetes Event-driven Autoscaling (KEDA)

OpenShift AI now supports Kubernetes Event-driven Autoscaling (KEDA) in its standard deployment mode. This Technology Preview feature enables metrics-based autoscaling for inference services, allowing for more efficient management of accelerator resources, reduced operational costs, and improved performance for your inference services.

To set up autoscaling for your inference service in standard deployments, you need to install and configure the OpenShift Custom Metrics Autoscaler (CMA), which is based on KEDA.

For more information about this feature, see: Configuring metrics-based autoscaling.

LM-Eval model evaluation UI feature: TrustyAI now offers a user-friendly UI for LM-Eval model evaluations as Technology Preview. This feature allows you to input evaluation parameters for a given model and returns an evaluation-results page, all from the UI.

Use Guardrails Orchestrator with LlamaStack

You can now run detections using the Guardrails Orchestrator tool from TrustyAI with Llama Stack as a Technology Preview feature, using the built-in detection component. To use this feature, ensure TrustyAI is enabled, the FMS Orchestrator and detectors are set up, and KServe RawDeployment mode is in use for full compatibility if needed. There is no manual set up required.

Then, in the DataScienceCluster custom resource for the Red Hat OpenShift AI Operator, set the spec.llamastackoperator.managementState field to Managed.

For more information, see the following resources on GitHub:

Trusty AI FMS Provider

Build Generative AI Apps with OpenShift AI

With this release, the Llama Stack Technology Preview feature on OpenShift AI enables Retrieval-Augmented Generation (RAG) and agentic workflows for building next-generation generative AI applications. It supports remote inference, built-in embeddings, and vector database operations. It also integrates with providers like TrustyAI’s provider for safety and Trusty AI’s LM-Eval provider for evaluation.

This preview includes tools, components, and guidance for enabling the Llama Stack Operator, interacting with the RAG Tool, and automating PDF ingestion and keyword search capabilities to enhance document discovery.

New option to disable caching for all pipelines in a project: Cluster administrators can now disable caching for all data science pipelines in the pipeline server. This global setting is useful for scenarios such as debugging, development, or cases that require deterministic re-execution. To apply this setting, set the spec.apiServer.cacheEnabled field to false in the DataSciencePipelinesApplication (DSPA) custom resource. For more information, see Overview of data science pipelines caching.

Define and manage pipelines with Kubernetes API

You can now define and manage data science pipelines and pipeline versions by using the Kubernetes API, which stores them as custom resources in the cluster instead of the internal database. This Technology Preview feature makes it easier to use OpenShift GitOps (Argo CD) or similar tools to manage pipelines, while still allowing you to manage them through the OpenShift AI user interface, API, and kfp SDK. To enable this feature, set the spec.apiServer.pipelineStore field to kubernetes in the DataSciencePipelinesApplication (DSPA) custom resource. For more information, see Defining a pipeline by using the Kubernetes API.

Model customization with LAB-tuning

LAB-tuning is now available as a Technology Preview feature, enabling data scientists to run an end-to-end workflow for customizing large language models (LLMs). The LAB (Large-scale Alignment for chatBots) method offers a more efficient alternative to traditional fine-tuning by leveraging taxonomy-guided synthetic data generation (SDG) and a multi-phase training approach.

Data scientists can run LAB-tuning workflows directly from the OpenShift AI dashboard by using the new preconfigured InstructLab pipeline, which simplifies the tuning process. For details on enabling and using LAB-tuning, see Enabling LAB-tuning and Customizing models with LAB-tuning.

Important

The LAB-tuning feature is not currently supported for disconnected environments.

Red Hat OpenShift AI Model Catalog: The Red Hat OpenShift AI Model Catalog is now available as a Technology Preview feature. This functionality starts with connecting users with the Granite family of models, as well as the teacher and judge models used in LAB-tuning.
Note
The model catalog feature is not currently supported for disconnected environments.

New Feature Store component

You can now install and manage Feature Store as a configurable component in the Red Hat OpenShift AI Operator. Based on the open-source Feast project, Feature Store acts as a bridge between ML models and data, enabling consistent and scalable feature management across the ML lifecycle.

This Technology Preview release introduces the following capabilities:

Centralized feature repository for consistent feature reuse
Python SDK and CLI for programmatic and command-line interactions to define, manage, and retrieve features for ML models
Feature definition and management
Support for a wide range of data sources
Data ingestion via feature materialization
Feature retrieval for both online model inference and offline model training
Role-Based Access Control (RBAC) to protect sensitive features
Extensibility and integration with third-party data and compute providers
Scalability to meet enterprise ML needs
Searchable feature catalog
Data lineage tracking for enhanced observability
For configuration details, see Configuring Feature Store.

IBM Power and IBM Z architecture support: IBM Power (ppc64le) and IBM Z (s390x) architectures are now supported as a Technology Preview feature. Currently, you can only deploy models in standard mode on these architectures.

Support for vLLM in IBM Power and IBM Z architectures: vLLM runtime templates are available for use in IBM Power and IBM Z architectures as Technology Preview.

Enable targeted deployment of workbenches to specific worker nodes in Red Hat OpenShift AI Dashboard using node selectors

Hardware profiles are now available as a Technology Preview. The hardware profiles feature enables users to target specific worker nodes for workbenches or model-serving workloads. It allows users to target specific accelerator types or CPU-only nodes.

This feature replaces the current accelerator profiles feature and container size selector field, offering a broader set of capabilities for targeting different hardware configurations. While accelerator profiles, taints, and tolerations provide some capabilities for matching workloads to hardware, they do not ensure that workloads land on specific nodes, especially if some nodes lack the appropriate taints.

The hardware profiles feature supports both accelerator and CPU-only configurations, along with node selectors, to enhance targeting capabilities for specific worker nodes. Administrators can configure hardware profiles in the settings menu. Users can select the enabled profiles using the UI for workbenches, model serving, and Data Science Pipelines where applicable.

Mandatory Kueue local-queue labeling policy for Ray cluster and PyTorchJob creation

Cluster administrators can use the Validating Admission Policy feature to enforce the mandatory labeling of Ray cluster and PyTorchJob resources with Kueue local-queue identifiers. This labeling ensures that workloads are properly categorized and routed based on queue management policies, which prevents resource contention and enhances operational efficiency.

When the local-queue labeling policy is enforced, Ray clusters and PyTorchJobs are created only if they are configured to use a local queue, and the Ray cluster and PyTorchJob resources are then managed by Kueue. The local-queue labeling policy is enforced for all projects by default, but can be disabled for some or all projects. For more information about the local-queue labeling policy, see Enforcing the use of local queues.

Note

This feature might introduce a breaking change for users who did not previously use Kueue local queues to manage their Ray cluster and PyTorchJob resources.

RStudio Server workbench image

With the RStudio Server workbench image, you can access the RStudio IDE, an integrated development environment for R. The R programming language is used for statistical computing and graphics to support data analysis and predictions.

To use the RStudio Server workbench image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

Disclaimer: Red Hat supports managing workbenches in OpenShift AI. However, Red Hat does not provide support for the RStudio software. RStudio Server is available through rstudio.org and is subject to their licensing terms. You should review their licensing terms before you use this sample workbench.

CUDA - RStudio Server workbench image

With the CUDA - RStudio Server workbench image, you can access the RStudio IDE and NVIDIA CUDA Toolkit. The RStudio IDE is an integrated development environment for the R programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can enhance your work by using GPU-accelerated libraries and optimization tools.

To use the CUDA - RStudio Server workbench image, you must first build it by creating a secret and triggering the BuildConfig, and then enable it in the OpenShift AI UI by editing the rstudio-rhel9 image stream. For more information, see Building the RStudio Server workbench images.

Important

The CUDA - RStudio Server workbench image contains NVIDIA CUDA technology. CUDA licensing information is available in the CUDA Toolkit documentation. You should review their licensing terms before you use this sample workbench.

Model Registry

OpenShift AI now supports the Model Registry Operator. The Model Registry Operator is not installed by default in Technology Preview mode. The model registry is a central repository that contains metadata related to machine learning models from inception to deployment.

Support for multinode deployment of very large models

Serving models over multiple graphical processing unit (GPU) nodes when using a single-model serving runtime is now available as a Technology Preview feature. Deploy your models across multiple GPU nodes to improve efficiency when deploying large models such as large language models (LLMs). For more information, see Deploying models across multiple GPU nodes.

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 3. Technology Preview features

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links