이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 5. Developer Preview features
This section describes Developer Preview features in Red Hat OpenShift AI 3.3. Developer Preview features are not supported by Red Hat in any way and are not functionally complete or production-ready. Do not use Developer Preview features for production or business-critical workloads. Developer Preview features provide early access to functionality in advance of possible inclusion in a Red Hat product offering. Customers can use these features to test functionality and provide feedback during the development process. Developer Preview features might not have any documentation, are subject to change or removal at any time, and have received limited testing. Red Hat might provide ways to submit feedback on Developer Preview features without an associated SLA.
For more information about the support scope of Red Hat Developer Preview features, see Developer Preview Support Scope.
- Run evaluations for TrustyAI-Llama Stack using LM-Eval
You can now run evaluations using LM-Eval on Llama Stack with TrustyAI as a Developer Preview feature, using the built-in LM-Eval component and advanced content moderation tools. To use this feature, ensure TrustyAI is enabled, the FMS Orchestrator and detectors are set up, and KServe RawDeployment mode is in use for full compatibility if needed. There is no manual set up required.
Then, in the
DataScienceClustercustom resource for the Red Hat OpenShift AI Operator, set thespec.llamastackoperator.managementStatefield toManaged.For more information, see the following resources on GitHub:
- LLM Compressor integration
LLM Compressor capabilities are now available in Red Hat OpenShift AI as a Developer Preview feature. A new workbench image with the
llm-compressorlibrary and a corresponding data science pipelines runtime image make it easier to compress and optimize your large language models (LLMs) for efficient deployment with vLLM. For more information, seellm-compressorin GitHub.You can use LLM Compressor capabilities in two ways:
-
Use a Jupyter notebook with the workbench image available at Red Hat Quay.io:
opendatahub / llmcompressor-workbench.
For an example Jupyter notebook, seeexamples/llmcompressor/workbench_example.ipynbin thered-hat-ai-examplesrepository. -
Run a data science pipeline that executes model compression as a batch process with the runtime image available at Red Hat Quay.io:
opendatahub / llmcompressor-pipeline-runtime.
For an example pipeline, seeexamples/llmcompressor/oneshot_pipeline.pyin thered-hat-ai-examplesrepository.
-
Use a Jupyter notebook with the workbench image available at Red Hat Quay.io:
- MLflow integration
- OpenShift AI now includes a Developer Preview of MLflow. MLflow uses Kubernetes namespaces (OpenShift projects) as workspaces to provide logical isolation of experiments, registered models, and prompts. MLflow uses Kubernetes role-based access control (RBAC) to authorize API requests. For more information about enabling and using MLflow in OpenShift AI, see the Configuring MLflow in OpenShift AI (Developer Preview) Knowledgebase article.
- AI Available Assets integration with Model-as-a-Service (MaaS)
This feature is available as a Developer Preview.
You can now access and consume Model-as-a-Service (MaaS) models directly from the AI Available Assets page in the GenAI Studio.
Administrators can configure a MaaS by enabling the toggle in the Model Deployments page. When a model is marked as a service, it becomes global and visible across all projects in the cluster.
- Additional fields added to Model Deployments for AI Available Assets integration
This feature is available as a Developer Preview.
Administrators can now add metadata to models during deployment so that they are automatically listed on the AI Available Assets page.
The following table describes the new metadata fields that streamline the process of making models discoverable and consumable by other teams:
| Field name | Field type | Description |
|---|---|---|
| Use Case | Free-form text | Describes the model’s primary purpose, for example, "Customer Churn Prediction" or "Image Classification for Product Catalog." |
| Description | Free-form text | Provides more detailed context and functionality notes for the model. |
| Add to AI Assets | Checkbox | When enabled, automatically publishes the model and its metadata to the AI Available Assets page. |
- Compatibility of Llama Stack remote providers and SDK with MCP HTTP streaming protocol
This feature is available as a Developer Preview.
Llama Stack remote providers and the SDK are now compatible with the Model Context Protocol (MCP) HTTP streaming protocol.
This enhancement enables developers to build fully stateless MCP servers, simplify deployment on standard Llama Stack infrastructure (including serverless environments), and improve scalability. It also prepares for future enhancements such as connection resumption and provides a smooth transition away from Server-Sent Events (SSE).
- Packaging of ITS Hub dependencies to the Red Hat–maintained Python index
This feature is available as a Developer Preview.
All Inference Time Scaling (ITS) runtime dependencies are now packaged in the Red Hat-maintained Python index, allowing Red Hat AI and OpenShift AI customers to install
its_huband its dependencies directly by usingpip.This enhancement enables users to build custom inference images with ITS algorithms focused on improving model accuracy at inference time without requiring model retraining, such as:
- Particle filtering
- Best-of-N
- Beam search
- Self-consistency
Verifier or PRM-guided search
For more information, see the ITS Hub on GitHub.
- Dynamic hardware-aware continual training strategy
Static hardware profile support is now available to help users select training methods, models, and hyperparameters based on VRAM requirements and reference benchmarks. This approach ensures predictable and reliable training workflows without dynamic hardware discovery.
The following components are included:
- API Memory Estimator: Accepts model, training method, dataset metadata, and assumed hyperparameters as input and returns an estimated VRAM requirement for the training job. Delivered as an API within Training Hub.
- Reference Profiles and Benchmarks: Provides end-to-end training time benchmarks for OpenShift AI Innovation (OSFT) and Performance Team (LAB SFT) baselines, delivered as static tables and documentation in Training Hub.
Hyperparameter Guidance: Publishes safe starting ranges for key hyperparameters such as learning rate, batch size, epochs, and LoRA rank. Integrated into example notebooks maintained by the AI Innovation team.
ImportantHardware discovery is not included in this release. Only static reference tables and guidance are provided; automated GPU or CPU detection is not yet supported.
- Human-in-the-Loop (HIL) functionality in the Llama Stack agent
Human-in-the-Loop (HIL) functionality has been added to the Llama Stack agent to allow users to approve unread tool calls before execution.
This enhancement includes the following capabilities:
- Users can approve or reject unread tool calls through the responses API.
- Configuration options specify which tool calls require HIL approval.
- Tool calls pause until user approval is received for HIL-enabled tools.
- Tool calls that do not require HIL continue to run without interruption.