Chapter 1. Introducing Red Hat AI


Red Hat AI is a trusted, comprehensive portfolio that accelerates time to market and reduces operational costs for delivering artificial intelligence (AI) solutions across hybrid cloud and edge environments. The portfolio enables infrastructure sovereignty and economic efficiency by supporting multiple hardware accelerators, OEMs, and cloud providers, ensuring a stable, optimized, and high-performance platform across diverse infrastructures.

With Red Hat AI, you can efficiently tune and deploy both predictive and generative AI (gen AI) models at scale—from single-server deployments to highly distributed platforms. The portfolio supports each stage of the AI adoption journey, from initial experimentation to large-scale production workloads. Access to innovations is complemented by Red Hat’s partner ecosystem, which offers tested, supported, and certified products and services that integrate with Red Hat technologies to help you solve business and technical challenges.

Red Hat AI includes:

Red Hat Enterprise Linux AI 3: An integrated Red Hat Enterprise Linux server appliance for LLM inference serving and model deployment in bare metal and cloud environments. Includes inference optimization capabilities and access to Red Hat’s curated model repository.

Red Hat OpenShift AI: An integrated MLOps platform that helps you manage your artificial intelligence and machine learning (AI/ML) lifecycle across hybrid cloud and edge environments. It provides tools for collaborative development, training, fine-tuning, and deployment of predictive and generative AI models.

Red Hat AI Inference: Advanced inference runtime that optimizes serving and inferencing with large language models. Offers reduced latency and higher throughput through continuous batching and tensor parallelism, with support for cost reduction through memory-efficient paged attention mechanisms.

Distributed Inference with llm-d: Kubernetes-native distributed inference framework that separates the model serving control plane from the inference data plane. Provides intelligent scheduling, prefill-decode disaggregation, and key-value (KV) cache management for cost-effective model serving at scale.

1.1. Understanding Red Hat Enterprise Linux AI 3

Use Red Hat Enterprise Linux AI 3 to deploy large language models (LLMs) and other AI models with support for multiple hardware accelerators in Linux bare metal and cloud environments. RHEL AI 3 provides bootable images for deployment on bare metal, Amazon Web Services (AWS), Azure, IBM Cloud, and Google Cloud Platform (GCP). You can purchase RHEL AI 3 from the IBM Cloud, AWS, and Azure marketplaces and deploy it on GPU-enabled instances.

Red Hat Enterprise Linux AI 3 has the following features:

  • Inference serving appliance for bare metal and hybrid cloud: Run your choice of models across Linux environments. Deploy RHEL AI 3 as a bootable Red Hat Enterprise Linux (RHEL) image configured for specific hardware accelerators, including NVIDIA CUDA, AMD Instinct, Intel, and Google TPU.
  • Red Hat AI Model Optimization Toolkit: Compress models to optimize accelerator usage and reduce compute costs while maintaining high model accuracy.
  • Curated model repository: Access a collection of validated and optimized models from Red Hat ready for inference serving deployments. Models are tested for performance across various hardware accelerators.
  • Multi-hardware support: Support for NVIDIA, AMD, Intel, Google TPU, and AWS Inferentia accelerators, with additional hardware support planned.
  • Certified for all Red Hat products: RHEL AI 3 is available as part of Red Hat AI and integrates with OpenShift AI.

1.2. Understanding Red Hat OpenShift AI

Red Hat OpenShift AI is a comprehensive MLOps platform designed to streamline artificial intelligence and machine learning (AI/ML) development and operations across hybrid cloud environments and the edge. It fosters collaboration between data scientists and developers while ensuring IT oversight and governance, enabling organizations to efficiently build, train, fine-tune, and deploy both predictive and generative AI models at scale.

Offered as a self-managed solution or as a managed service, OpenShift AI builds on the robust foundation of Red Hat OpenShift, providing a trusted, secure platform for deploying AI-enabled applications and ML models across public clouds, on-premises, and edge environments.

By using a broad open source technology ecosystem, Red Hat OpenShift AI accelerates AI/ML innovation, ensures operational consistency across deployments, enhances hybrid cloud flexibility, and upholds transparency, choice, and responsible AI practices.

1.2.1. Key benefits of OpenShift AI

  • Simplified AI adoption: Reduces the complexities of building and delivering AI models and applications that are accurate, reliable, and secure.
  • Enterprise-ready open source tools: Provides a fully supported, secure enterprise version of open source AI tools, ensuring seamless integration and interoperability.
  • Accelerated innovation: Gives organizations access to the latest AI technologies, helping them stay competitive in a rapidly evolving market.
  • Extensive partner ecosystem: Enables organizations to select best-in-class technologies from a certified AI ecosystem, increasing flexibility and choice.
  • Integrated development environments (IDEs): Provides access to IDEs including JupyterLab, with pre-configured libraries such as TensorFlow, PyTorch, and Scikit-learn.
  • AI pipelines: Supports end-to-end ML workflows by using containerized pipeline orchestration.
  • Accelerated computing: Integrated support for GPUs and Intel Gaudi AI accelerators to speed up model training and inference.
  • Model deployment and serving: Deploy models in a variety of environments and integrate them into applications by using APIs.
  • Seamless OpenShift integration: Leverages OpenShift identity providers and resource allocation tools for secure and efficient user management.
  • Accelerator management: Enables efficient resource scheduling for GPU and AI accelerator usage.
  • Flexible deployment: Available as a self-managed solution or as a managed service in Red Hat OpenShift Dedicated and Red Hat OpenShift Service on AWS (ROSA).
  • Scalability and security: Provides enterprise-grade security features and governance controls for AI workloads.

1.3. Understanding Red Hat AI Inference

Red Hat AI Inference is an enterprise-grade inference platform for serving large language models (LLMs) at scale, built on the open source vLLM project and hardened by Red Hat. Red Hat AI Inference provides two deployment models: a standalone single-container vLLM runtime for optimized inference on any Kubernetes or Linux environment, and Distributed Inference with llm-d for distributed inference across Kubernetes clusters.

1.3.1. Standalone inference

The standalone vLLM runtime is a single-container deployment that delivers optimized inference for LLMs on any Kubernetes or Linux environment, from bare metal to public cloud. The standalone runtime uses the following techniques to deliver reduced latency and higher throughput:

  • Continuous batching: Processes model requests as they arrive instead of waiting for a full batch to accumulate, reducing idle time and improving throughput.
  • Tensor parallelism: Distributes LLM workloads across multiple GPUs to enable serving of larger models and faster inference.
  • Paged attention: Reduces memory waste by provisioning memory for LLMs similar to virtual memory in operating systems, significantly lowering inference costs.

1.3.2. Shared capabilities

The following capabilities are available for both standalone and distributed deployments:

  • Flexible deployment across infrastructure: Run your choice of models across multiple accelerators, Kubernetes clusters, and Linux environments—from bare metal to public cloud.
  • Model optimization: Use Red Hat AI Model Optimization Toolkit to compress models and optimize accelerator and compute usage while maintaining high model accuracy.
  • Curated model catalog: Access Red Hat’s collection of validated and optimized models ready for inference deployment, with support for NVIDIA, AMD, Intel, and other accelerators.
  • Multi-hardware acceleration: Support for NVIDIA GPUs, AMD Instinct accelerators, Intel processors, Google TPU, AWS Inferentia, and additional hardware platforms.
  • Enterprise integration: Certified for use with RHEL AI 3, OpenShift AI, and other Red Hat products for seamless deployment across your AI infrastructure.

1.3.3. Distributed Inference with llm-d

Distributed Inference with llm-d extends AI Inference with Kubernetes-native distributed inference, enabling enterprise-grade LLM serving at scale on Openshift Container Platform and managed Kubernetes clusters such as Azure Kubernetes Service (AKS) and CoreWeave Kubernetes Service (CKS).

Distributed Inference with llm-d separates the model serving control plane from the inference data plane. KServe manages model lifecycle, scaling, and API exposure, while the llm-d inference scheduler handles runtime-aware scheduling, cache locality optimization, and intelligent request distribution across pods and nodes.

Distributed Inference with llm-d provides the following capabilities:

  • Intelligent inference scheduling: Replaces basic request distribution with prefix-cache aware routing that directs each request to the replica most likely to have relevant KV cache entries already populated, maximizing GPU utilization.
  • Prefill-decode disaggregation: Separates the compute-intensive prefill phase from the latency-sensitive decode phase, allowing each phase to be assigned to appropriately optimized resources and scaled independently.
  • KV cache management: Manages key-value cache efficiently across distributed inference servers, reducing memory requirements and enabling longer context windows.
  • Wide expert parallelism: Supports efficient distributed inference of mixture of experts (MoE) models across many GPU nodes, enabling cost-effective scaling of large models.
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat Documentation

Legal Notice

Theme

© 2026 Red Hat
Back to top