Este contenido no está disponible en el idioma seleccionado.

Chapter 1. Enterprise-grade inference serving


The Distributed Inference with llm-d framework provides enterprise-grade large language model (LLM) inference serving on Openshift Container Platform and managed Kubernetes clusters on public clouds such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Enterprise platform engineering and infrastructure teams can use Distributed Inference with llm-d to build generative AI model services for internal and external use cases. Cloud service providers can also use it to build Models-as-a-Service (MaaS) offerings. Common use cases include:

Enterprise-wide Models-as-a-Service (MaaS) for generative AI
A central platform team provides generative AI and LLM capabilities as a managed service to business units across the organization. Rather than each team provisioning its own inference infrastructure, the platform team uses Distributed Inference with llm-d to offer standardized model serving with consistent performance, cost control, and security.
Production-ready inference at scale
An organization deploys a generative AI application in a limited production environment, such as A/B testing with a small user group or a soft launch. The deployment must be production-ready with reliable performance and security, while remaining flexible enough to scale to provider-grade inference as demand grows.
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de la documentación de Red Hat

Legal Notice

Theme

© 2026 Red Hat
Volver arriba