Chapter 2. New features and enhancements


This section describes new features and enhancements in Red Hat OpenShift AI.

2.1. New features

Guardrails Orchestrator Framework

The Guardrails Orchestrator Framework is now generally available.

The Guardrails Orchestrator is a TrustyAI service that adds safety and policy checks (guardrails) to Large Language Models (LLMs). Managed by the TrustyAI Operator, it lets you define rules (detectors) to filter LLM input/output.

Why does it matter?

  • LLMs can generate harmful, biased, or inaccurate content. Guardrails Orchestrator mitigates these risks, preventing reputational damage, ethical issues, and legal liabilities.
  • It helps ensure your LLM applications are safe, reliable, and policy-compliant. Key benefits include detecting harmful content, enforcing policies, and improving security and quality.
Delivery of Kubeflow Training Operator (KFTO) for Distributed PyTorch Jobs in OpenShift
This feature enables users to run distributed training jobs using Kubeflow Training Operator (KTO) using PyTorch jobs, supporting CUDA and ROCm accelerators.
View installed components and versions
You can now view a list of the installed OpenShift AI components, their corresponding upstream components, and the versions of the installed components. You can access the list of installed components from the Help About menu in the Red Hat OpenShift AI dashboard.
OCI containers for model storage

You can use OCI storage as an alternative to cloud storage services for model serving. First, create an OCI container image to contain the model. The image is uploaded to an OCI-compatible registry, such as Quay. When deploying a model, the model serving platform references the repository of the containerized model.

Using an OCI container can provide the following advantages:

  • Reduced startup times, because the cluster keeps a cache of downloaded images. Restarting the model pod does not download the model again.
  • Lower disk space usage, because the model is not downloaded on each pod replica, assuming pods are scheduled on the same node.
  • Enhanced performance when pre-fetching images or asynchronous loading.
  • Compatibility and integration, because it can be easily integrated with KServe. No additional dependencies are required and the infrastructure might already be available.

For more information, see Using OCI containers for model storage.

Multi-node PyTorch distributed training with KFTO

Distributed PyTorch training across multiple nodes and GPUs using the Kubeflow Training Operator is now supported. This feature enables the following functionality:

  • Configuration for single or multiple GPUs per node using the PyTorchJob API
  • Support for the kubeflow-training SDK
  • Supports NCCL, RCCL, and GLOO backends for GPU and CPU workloads, with configurable resource allocation
  • Training scripts can be mounted using ConfigMaps or included in custom container images.
  • Support for both DDP and FSDP distributed training approaches.
  • Job scheduling through Distributed Workloads capabilities, or Kueue
  • Runtime metrics accessible using OpenShift monitoring
NVIDIA GPUDirect RDMA support for distributed model training

NVIDIA GPUDirect RDMA, which uses Remote Direct Memory Access (RDMA) to provide direct GPU interconnect, is now supported for distributed model training with KFTO. This feature enables NCCL-based collective communication with RDMA over Converged Ethernet (RoCE) and InfiniBand on compatible NVIDIA accelerated networking platforms.

The Kubeflow Training image for CUDA has been updated to include RDMA userspace libraries.

2.2. Enhancements

Deploying models in standard deployment mode

You can deploy models in either advanced or standard deployment mode. Standard deployment mode uses KServe RawDeployment mode and does not require the Red Hat OpenShift Serverless Operator, Red Hat OpenShift Service Mesh, or Authorino.

Benefits to standard deployment mode include:

  • Enables deployment with Kubernetes resources, such as Deployment, Service, Route, and Horizontal Pod Autoscaler. The resulting model deployment has a smaller resource footprint compared to advanced mode.
  • Enables traditional Deployment/Pod configurations, such as mounting multiple volumes, which is not available using Knative.This is beneficial for applications requiring complex configurations or multiple storage mounts.

    For more information, see About KServe deployment modes.

Updated naming for vLLM serving runtime templates

Template naming has been updated to better distinguish the vLLM templates based on the accelerator supported. All vLLM templates now reflect the accelerator name in the title and description of the template:

  • NVIDIA GPU
  • AMD GPU
  • Gaudi accelerators
  • (Technology preview only): CPU (IBM Power and IBM Z)
Support for vLLM Inference Server with Intel Gaudi 1.20
Support for vLLM Inference Server with Intel Gaudi 1.20 accelerators is now available.
Upgraded OpenVINO Model Server
The OpenVINO Model Server has been upgraded to version 2025.0. For information on the changes and enhancements, see OpenVINO™ Model Server 2025.0.
Updated workbench images
A new set of workbench images (2025.1) are now available. This update includes major version upgrades for most pre-built Python packages and updated IDEs for RStudio and Code-server.
Support for Kubeflow Pipelines 2.4.0 in data science pipelines
To keep Red Hat OpenShift AI updated with the latest features, data science pipelines have been upgraded to Kubeflow Pipelines (KFP) version 2.4.0. For more information, see the Kubeflow Pipelines documentation.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.