このコンテンツは選択した言語では利用できません。

Chapter 1. Enterprise-grade inference serving

The Distributed Inference with llm-d framework provides enterprise-grade large language model (LLM) inference serving on Openshift Container Platform and managed Kubernetes clusters on public clouds such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Enterprise platform engineering and infrastructure teams can use Distributed Inference with llm-d to build generative AI model services for internal and external use cases. Cloud service providers can also use it to build Models-as-a-Service (MaaS) offerings. Common use cases include:

Enterprise-wide Models-as-a-Service (MaaS) for generative AI: A central platform team provides generative AI and LLM capabilities as a managed service to business units across the organization. Rather than each team provisioning its own inference infrastructure, the platform team uses Distributed Inference with llm-d to offer standardized model serving with consistent performance, cost control, and security.
Production-ready inference at scale: An organization deploys a generative AI application in a limited production environment, such as A/B testing with a small user group or a soft launch. The deployment must be production-ready with reliable performance and security, while remaining flexible enough to scale to provider-grade inference as demand grows.

このコンテンツは選択した言語では利用できません。

Chapter 1. Enterprise-grade inference serving

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links