이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 4. Developer Preview features

Important

This section describes Developer Preview features in Red Hat OpenShift AI 2.24. Developer Preview features are not supported by Red Hat in any way and are not functionally complete or production-ready. Do not use Developer Preview features for production or business-critical workloads. Developer Preview features provide early access to functionality in advance of possible inclusion in a Red Hat product offering. Customers can use these features to test functionality and provide feedback during the development process. Developer Preview features might not have any documentation, are subject to change or removal at any time, and have received limited testing. Red Hat might provide ways to submit feedback on Developer Preview features without an associated SLA.

For more information about the support scope of Red Hat Developer Preview features, see Developer Preview Support Scope.

Distributed Inference Server for LLMs: Distributed Inference Server (vLLM with Distributed Routing) is now available as a Developer Preview feature. Distributed Inference Server supports multi-model serving, intelligent inference scheduling, and disaggregated serving for improved GPU utilization on GenAI models.

For more information, see Deploying a model by using the LLM Inference Service (LLM-D).

Run evaluations for TrustyAI-Llama Stack using LM-Eval

You can now run evaluations using LM-Eval on Llama Stack with TrustyAI as a Developer Preview feature, using the built-in LM-Eval component and advanced content moderation tools. To use this feature, ensure TrustyAI is enabled, the FMS Orchestrator and detectors are set up, and KServe RawDeployment mode is in use for full compatibility if needed. There is no manual set up required.

Then, in the DataScienceCluster custom resource for the Red Hat OpenShift AI Operator, set the spec.llamastackoperator.managementState field to Managed.

For more information, see the following resources on GitHub:

LLM Compressor integration

LLM Compressor capabilities are now available in Red Hat OpenShift AI as a Developer Preview feature. A new workbench image with the llm-compressor library and a corresponding data science pipelines runtime image make it easier to compress and optimize your large language models (LLMs) for efficient deployment with vLLM. For more information, see llm-compressor in GitHub.

You can use LLM Compressor capabilities in two ways:

Use a Jupyter notebook with the workbench image available at Red Hat Quay.io: opendatahub / llmcompressor-workbench.
For an example Jupyter notebook, see examples/llmcompressor/workbench_example.ipynb in the red-hat-ai-examples repository.
Run a data science pipeline that executes model compression as a batch process with the runtime image available at Red Hat Quay.io: opendatahub / llmcompressor-pipeline-runtime.
For an example pipeline, see examples/llmcompressor/oneshot_pipeline.py in the red-hat-ai-examples repository.

Support for AppWrapper in Kueue

AppWrapper support in Kueue is available as a Developer Preview feature. The experimental API enables the use of AppWrapper-based workloads with the distributed workloads feature.

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 4. Developer Preview features

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links