Chapter 2. Working with Llama Stack


Llama Stack is a unified AI runtime environment designed to simplify the deployment and management of generative AI workloads on OpenShift AI. Llama Stack integrates LLM inference servers, vector databases, and retrieval services in a single stack, optimized for Retrieval-Augmented Generation (RAG) and agent-based AI workflows. In OpenShift, the Llama Stack Operator manages the deployment lifecycle of these components, ensuring scalability, consistency, and integration with OpenShift AI projects.

Important

Llama Stack integration is currently available in Red Hat OpenShift AI 2.23 as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Llama Stack includes the following components:

  • Inference model servers such as vLLM, designed to efficiently serve large language models.
  • Vector storage solutions, primarily Milvus, to store embeddings generated from your domain data.
  • Retrieval and embedding management workflows using integrated tools, such as Docling, to handle continuous data ingestion and synchronization.
  • Integration with OpenShift AI by using the LlamaStackDistribution custom resource, simplifying configuration and deployment.

For information about how to deploy Llama Stack in OpenShift AI, see Deploying a RAG stack in a Data Science Project.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat