Este contenido no está disponible en el idioma seleccionado.

Chapter 4. Developer Preview features

Important

This section describes Developer Preview features in Red Hat OpenShift AI 2.24. Developer Preview features are not supported by Red Hat in any way and are not functionally complete or production-ready. Do not use Developer Preview features for production or business-critical workloads. Developer Preview features provide early access to functionality in advance of possible inclusion in a Red Hat product offering. Customers can use these features to test functionality and provide feedback during the development process. Developer Preview features might not have any documentation, are subject to change or removal at any time, and have received limited testing. Red Hat might provide ways to submit feedback on Developer Preview features without an associated SLA.

For more information about the support scope of Red Hat Developer Preview features, see Developer Preview Support Scope.

Distributed Inference Server for LLMs: Distributed Inference Server (vLLM with Distributed Routing) is now available as a Developer Preview feature. Distributed Inference Server supports multi-model serving, intelligent inference scheduling, and disaggregated serving for improved GPU utilization on GenAI models.

For more information, see Deploying a model by using the LLM Inference Service (LLM-D).

Run evaluations for TrustyAI-Llama Stack using LM-Eval

You can now run evaluations using LM-Eval on Llama Stack with TrustyAI as a Developer Preview feature, using the built-in LM-Eval component and advanced content moderation tools. To use this feature, ensure TrustyAI is enabled, the FMS Orchestrator and detectors are set up, and KServe RawDeployment mode is in use for full compatibility if needed. There is no manual set up required.

Then, in the DataScienceCluster custom resource for the Red Hat OpenShift AI Operator, set the spec.llamastackoperator.managementState field to Managed.

For more information, see the following resources on GitHub:

LLM Compressor integration

LLM Compressor capabilities are now available in Red Hat OpenShift AI as a Developer Preview feature. A new workbench image with the llm-compressor library and a corresponding data science pipelines runtime image make it easier to compress and optimize your large language models (LLMs) for efficient deployment with vLLM. For more information, see llm-compressor in GitHub.

You can use LLM Compressor capabilities in two ways:

Use a Jupyter notebook with the workbench image available at Red Hat Quay.io: opendatahub / llmcompressor-workbench.
For an example Jupyter notebook, see examples/llmcompressor/workbench_example.ipynb in the red-hat-ai-examples repository.
Run a data science pipeline that executes model compression as a batch process with the runtime image available at Red Hat Quay.io: opendatahub / llmcompressor-pipeline-runtime.
For an example pipeline, see examples/llmcompressor/oneshot_pipeline.py in the red-hat-ai-examples repository.

Support for AppWrapper in Kueue

AppWrapper support in Kueue is available as a Developer Preview feature. The experimental API enables the use of AppWrapper-based workloads with the distributed workloads feature.

Este contenido no está disponible en el idioma seleccionado.

Chapter 4. Developer Preview features

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links