Chapter 4. Technology Preview features
This section describes Technology Preview features in Red Hat OpenShift AI 3.3. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
- OpenAI-compatible annotations for search and responses in Llama Stack
Starting with OpenShift AI 3.3, Llama Stack provides OpenAI-compatible grounding and citation annotations for search-backed responses as a Technology Preview feature.
This enhancement enables retrieval-augmented generation (RAG) applications to trace generated responses back to source documents by using the same annotation schemas returned by OpenAI Search and Responses APIs. The feature supports document source attribution and preserves citation metadata in API responses, allowing existing OpenAI client applications to consume citation information without code changes.
This capability improves transparency, auditability, and explainability for enterprise RAG workloads, and serves as a foundation for future advanced tracing and observability features in Llama Stack. For more information, see OpenAI API annotations for search and responses.
- The Llama Stack Operator available on multi-architecture clusters
- The Llama Stack Operator is now deployable on multi-architecture clusters in OpenShift AI version 3.3 and is available by default.
- Llama Stack versions in OpenShift AI 3.3
- OpenShift AI 3.3.0 includes Open Data Hub Llama Stack version 0.4.2.1+rhai0, which is based on upstream Llama Stack version 0.4.2.
- The Llama Stack Operator with ConfigMap driven image updates
The Llama Stack Operator in OpenShift AI 3.3 now offers ConfigMap driven image updates for LlamaStackDistribution resources. This allows you to patch security or bug fixes without new operator versions. To enable this feature, update your ConfigMap with the following parameters:
image-overrides: | starter-gpu: registry.redhat.io/rhoai/odh-llama-stack-core-rhel9:v3.3 starter: registry.redhat.io/rhoai/odh-llama-stack-core-rhel9:v3.3image-overrides: | starter-gpu: registry.redhat.io/rhoai/odh-llama-stack-core-rhel9:v3.3 starter: registry.redhat.io/rhoai/odh-llama-stack-core-rhel9:v3.3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Using the
starter-gpuandstarterdistributions names as the key allows the operator to apply these overrides automatically.To update the Llama Stack Distributions image for all
starterdistributions, run the following command:kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"starter: quay.io/opendatahub/llama-stack:latest"}}'$ kubectl patch configmap llama-stack-operator-config -n llama-stack-k8s-operator-system --type merge -p '{"data":{"image-overrides":"starter: quay.io/opendatahub/llama-stack:latest"}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow This allows the LlamaStackDistribution resources to restart with the new image.
- Model-as-a-Service (MaaS) integration
This feature is available as a Technology Preview.
OpenShift AI now includes Model-as-a-Service (MaaS) to address resource consumption and governance challenges associated with serving large language models (LLMs).
MaaS provides centralized control over model access and resource usage by exposing models through managed API endpoints, allowing administrators to enforce consumption policies across teams.
This Technology Preview introduces the following capabilities:
- Policy and quota management
- Authentication and authorization
- Usage tracking
- User management
Zero-Touch setup through Red Hat OpenShift AI operator
For more information, see Governing LLM access with models-as-a-service.
- MLServer ServingRuntime for KServe
The MLServer serving runtime for KServe is now available as a technology preview feature in Red Hat OpenShift AI. You can use this runtime to deploy models trained on structured data, such as classical machine learning models. You can deploy models directly without converting them to ONNX format, which simplifies the deployment process and improves performance.
This feature provides support for the following common machine learning frameworks:
- scikit-learn
- XGBoost
LightGBM
For more information, see Deploying models using the mlserver runtime and Supported configurations.