Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 1. Enterprise-grade inference serving


The Distributed Inference with llm-d framework provides enterprise-grade large language model (LLM) inference serving on Openshift Container Platform and managed Kubernetes clusters on public clouds such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Enterprise platform engineering and infrastructure teams can use Distributed Inference with llm-d to build generative AI model services for internal and external use cases. Cloud service providers can also use it to build Models-as-a-Service (MaaS) offerings. Common use cases include:

Enterprise-wide Models-as-a-Service (MaaS) for generative AI
A central platform team provides generative AI and LLM capabilities as a managed service to business units across the organization. Rather than each team provisioning its own inference infrastructure, the platform team uses Distributed Inference with llm-d to offer standardized model serving with consistent performance, cost control, and security.
Production-ready inference at scale
An organization deploys a generative AI application in a limited production environment, such as A/B testing with a small user group or a soft launch. The deployment must be production-ready with reliable performance and security, while remaining flexible enough to scale to provider-grade inference as demand grows.
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat Dokumentation

Legal Notice

Theme

© 2026 Red Hat
Nach oben