此内容没有您所选择的语言版本。

Chapter 1. Enterprise-grade inference serving


The Distributed Inference with llm-d framework provides enterprise-grade large language model (LLM) inference serving on Openshift Container Platform and managed Kubernetes clusters on public clouds such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Enterprise platform engineering and infrastructure teams can use Distributed Inference with llm-d to build generative AI model services for internal and external use cases. Cloud service providers can also use it to build Models-as-a-Service (MaaS) offerings. Common use cases include:

Enterprise-wide Models-as-a-Service (MaaS) for generative AI
A central platform team provides generative AI and LLM capabilities as a managed service to business units across the organization. Rather than each team provisioning its own inference infrastructure, the platform team uses Distributed Inference with llm-d to offer standardized model serving with consistent performance, cost control, and security.
Production-ready inference at scale
An organization deploys a generative AI application in a limited production environment, such as A/B testing with a small user group or a soft launch. The deployment must be production-ready with reliable performance and security, while remaining flexible enough to scale to provider-grade inference as demand grows.
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

关于红帽文档

Legal Notice

Theme

© 2026 Red Hat
返回顶部