Chapter 1. Introducing Red Hat AI

Red Hat AI is a trusted, comprehensive portfolio that accelerates time to market and reduces operational costs for delivering artificial intelligence (AI) solutions across hybrid cloud and edge environments. The portfolio enables infrastructure sovereignty and economic efficiency by supporting multiple hardware accelerators, OEMs, and cloud providers, ensuring a stable, optimized, and high-performance platform across diverse infrastructures.

With Red Hat AI, you can efficiently tune and deploy both predictive and generative AI (gen AI) models at scale—from single-server deployments to highly distributed platforms. The portfolio supports each stage of the AI adoption journey, from initial experimentation to large-scale production workloads. Access to innovations is complemented by Red Hat’s partner ecosystem, which offers tested, supported, and certified products and services that integrate with Red Hat technologies to help you solve business and technical challenges.

Red Hat AI includes:

Red Hat Enterprise Linux AI 3: An integrated Red Hat Enterprise Linux server appliance for LLM inference serving and model deployment in bare metal and cloud environments. Includes inference optimization capabilities and access to Red Hat’s curated model repository.

Red Hat OpenShift AI: An integrated MLOps platform that helps you manage your artificial intelligence and machine learning (AI/ML) lifecycle across hybrid cloud and edge environments. It provides tools for collaborative development, training, fine-tuning, and deployment of predictive and generative AI models.

Red Hat AI Inference: Advanced inference runtime that optimizes serving and inferencing with large language models. Offers reduced latency and higher throughput through continuous batching and tensor parallelism, with support for cost reduction through memory-efficient paged attention mechanisms.

Distributed Inference with llm-d: Kubernetes-native distributed inference framework that separates the model serving control plane from the inference data plane. Provides intelligent scheduling, prefill-decode disaggregation, and key-value (KV) cache management for cost-effective model serving at scale.

1.1. Understanding Red Hat Enterprise Linux AI 3
Copy link

Use Red Hat Enterprise Linux AI 3 to deploy large language models (LLMs) and other AI models with support for multiple hardware accelerators in Linux bare metal and cloud environments. RHEL AI 3 provides bootable images for deployment on bare metal, Amazon Web Services (AWS), Azure, IBM Cloud, and Google Cloud Platform (GCP). You can purchase RHEL AI 3 from the IBM Cloud, AWS, and Azure marketplaces and deploy it on GPU-enabled instances.

Red Hat Enterprise Linux AI 3 has the following features:

Inference serving appliance for bare metal and hybrid cloud: Run your choice of models across Linux environments. Deploy RHEL AI 3 as a bootable Red Hat Enterprise Linux (RHEL) image configured for specific hardware accelerators, including NVIDIA CUDA, AMD Instinct, Intel, and Google TPU.
Red Hat AI Model Optimization Toolkit: Compress models to optimize accelerator usage and reduce compute costs while maintaining high model accuracy.
Curated model repository: Access a collection of validated and optimized models from Red Hat ready for inference serving deployments. Models are tested for performance across various hardware accelerators.
Multi-hardware support: Support for NVIDIA, AMD, Intel, Google TPU, and AWS Inferentia accelerators, with additional hardware support planned.
Certified for all Red Hat products: RHEL AI 3 is available as part of Red Hat AI and integrates with OpenShift AI.

1.2. Understanding Red Hat OpenShift AI
Copy link

Red Hat OpenShift AI is a comprehensive MLOps platform designed to streamline artificial intelligence and machine learning (AI/ML) development and operations across hybrid cloud environments and the edge. It fosters collaboration between data scientists and developers while ensuring IT oversight and governance, enabling organizations to efficiently build, train, fine-tune, and deploy both predictive and generative AI models at scale.

Offered as a self-managed solution or as a managed service, OpenShift AI builds on the robust foundation of Red Hat OpenShift, providing a trusted, secure platform for deploying AI-enabled applications and ML models across public clouds, on-premises, and edge environments.

By using a broad open source technology ecosystem, Red Hat OpenShift AI accelerates AI/ML innovation, ensures operational consistency across deployments, enhances hybrid cloud flexibility, and upholds transparency, choice, and responsible AI practices.

1.2.1. Key benefits of OpenShift AI
Copy link

Simplified AI adoption: Reduces the complexities of building and delivering AI models and applications that are accurate, reliable, and secure.
Enterprise-ready open source tools: Provides a fully supported, secure enterprise version of open source AI tools, ensuring seamless integration and interoperability.
Accelerated innovation: Gives organizations access to the latest AI technologies, helping them stay competitive in a rapidly evolving market.
Extensive partner ecosystem: Enables organizations to select best-in-class technologies from a certified AI ecosystem, increasing flexibility and choice.

1.2.2. Features for data scientists, developers, and MLOps engineers
Copy link

Integrated development environments (IDEs): Provides access to IDEs including JupyterLab, with pre-configured libraries such as TensorFlow, PyTorch, and Scikit-learn.
AI pipelines: Supports end-to-end ML workflows by using containerized pipeline orchestration.
Accelerated computing: Integrated support for GPUs and Intel Gaudi AI accelerators to speed up model training and inference.
Model deployment and serving: Deploy models in a variety of environments and integrate them into applications by using APIs.

1.2.3. Features for OpenShift cluster administrators
Copy link

Seamless OpenShift integration: Leverages OpenShift identity providers and resource allocation tools for secure and efficient user management.
Accelerator management: Enables efficient resource scheduling for GPU and AI accelerator usage.
Flexible deployment: Available as a self-managed solution or as a managed service in Red Hat OpenShift Dedicated and Red Hat OpenShift Service on AWS (ROSA).
Scalability and security: Provides enterprise-grade security features and governance controls for AI workloads.

1.3. Understanding Red Hat AI Inference
Copy link

Red Hat AI Inference is an enterprise-grade inference platform for serving large language models (LLMs) at scale, built on the open source vLLM project and hardened by Red Hat. Red Hat AI Inference provides two deployment models: a standalone single-container vLLM runtime for optimized inference on any Kubernetes or Linux environment, and Distributed Inference with llm-d for distributed inference across Kubernetes clusters.

1.3.1. Standalone inference
Copy link

The standalone vLLM runtime is a single-container deployment that delivers optimized inference for LLMs on any Kubernetes or Linux environment, from bare metal to public cloud. The standalone runtime uses the following techniques to deliver reduced latency and higher throughput:

Continuous batching: Processes model requests as they arrive instead of waiting for a full batch to accumulate, reducing idle time and improving throughput.
Tensor parallelism: Distributes LLM workloads across multiple GPUs to enable serving of larger models and faster inference.
Paged attention: Reduces memory waste by provisioning memory for LLMs similar to virtual memory in operating systems, significantly lowering inference costs.

1.3.2. Shared capabilities
Copy link

The following capabilities are available for both standalone and distributed deployments:

Flexible deployment across infrastructure: Run your choice of models across multiple accelerators, Kubernetes clusters, and Linux environments—from bare metal to public cloud.
Model optimization: Use Red Hat AI Model Optimization Toolkit to compress models and optimize accelerator and compute usage while maintaining high model accuracy.
Curated model catalog: Access Red Hat’s collection of validated and optimized models ready for inference deployment, with support for NVIDIA, AMD, Intel, and other accelerators.
Multi-hardware acceleration: Support for NVIDIA GPUs, AMD Instinct accelerators, Intel processors, Google TPU, AWS Inferentia, and additional hardware platforms.
Enterprise integration: Certified for use with RHEL AI 3, OpenShift AI, and other Red Hat products for seamless deployment across your AI infrastructure.

1.3.3. Distributed Inference with llm-d
Copy link

Distributed Inference with llm-d extends AI Inference with Kubernetes-native distributed inference, enabling enterprise-grade LLM serving at scale on Openshift Container Platform and managed Kubernetes clusters such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Distributed Inference with llm-d separates the model serving control plane from the inference data plane. KServe manages model lifecycle, scaling, and API exposure, while the llm-d inference scheduler handles runtime-aware scheduling, cache locality optimization, and intelligent request distribution across pods and nodes.

Distributed Inference with llm-d provides the following capabilities:

Intelligent inference scheduling: Replaces basic request distribution with prefix-cache aware routing that directs each request to the replica most likely to have relevant KV cache entries already populated, maximizing GPU utilization.
Prefill-decode disaggregation: Separates the compute-intensive prefill phase from the latency-sensitive decode phase, allowing each phase to be assigned to appropriately optimized resources and scaled independently.
KV cache management: Manages key-value cache efficiently across distributed inference servers, reducing memory requirements and enabling longer context windows.
Wide expert parallelism: Supports efficient distributed inference of mixture of experts (MoE) models across many GPU nodes, enabling cost-effective scaling of large models.

Chapter 1. Introducing Red Hat AI

1.1. Understanding Red Hat Enterprise Linux AI 3
Copy link

1.2. Understanding Red Hat OpenShift AI
Copy link

1.2.1. Key benefits of OpenShift AI
Copy link

1.2.2. Features for data scientists, developers, and MLOps engineers
Copy link

1.2.3. Features for OpenShift cluster administrators
Copy link

1.3. Understanding Red Hat AI Inference
Copy link

1.3.1. Standalone inference
Copy link

1.3.2. Shared capabilities
Copy link

1.3.3. Distributed Inference with llm-d
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. Introducing Red Hat AI

1.1. Understanding Red Hat Enterprise Linux AI 3Copy linkLink copied to clipboard!

1.2. Understanding Red Hat OpenShift AICopy linkLink copied to clipboard!

1.2.1. Key benefits of OpenShift AICopy linkLink copied to clipboard!

1.2.2. Features for data scientists, developers, and MLOps engineersCopy linkLink copied to clipboard!

1.2.3. Features for OpenShift cluster administratorsCopy linkLink copied to clipboard!

1.3. Understanding Red Hat AI InferenceCopy linkLink copied to clipboard!

1.3.1. Standalone inferenceCopy linkLink copied to clipboard!

1.3.2. Shared capabilitiesCopy linkLink copied to clipboard!

1.3.3. Distributed Inference with llm-dCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Understanding Red Hat Enterprise Linux AI 3
Copy link

1.2. Understanding Red Hat OpenShift AI
Copy link

1.2.1. Key benefits of OpenShift AI
Copy link

1.2.2. Features for data scientists, developers, and MLOps engineers
Copy link

1.2.3. Features for OpenShift cluster administrators
Copy link

1.3. Understanding Red Hat AI Inference
Copy link

1.3.1. Standalone inference
Copy link

1.3.2. Shared capabilities
Copy link

1.3.3. Distributed Inference with llm-d
Copy link