Chapter 3. Deploying a Llama Stack server


Llama Stack allows you to create and deploy a server that enables various APIs for accessing AI services in your OpenShift AI cluster. You can create a LlamaStackDistribution custom resource for your desired use cases.

The included procedure provides an example LlamaStackDistribution CR that deploys a Llama Stack server that enables the following setup:

  • A connection to a vLLM inference service with a llama32-3b model.
  • A connection to a remote vector database.
  • An allocated persistent storage.
  • Orchestration endpoints.

Prerequisites

  • You have installed OpenShift 4.19 or newer.
  • You have logged in to Red Hat OpenShift AI.
  • You have cluster administrator privileges for your OpenShift cluster.
  • You have activated the Llama Stack Operator in your cluster.
  • You have installed the PostgreSQL Operator version 14 or later in your cluster.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

Procedure

  1. In the OpenShift web console, select Administrator Quick Create ( quick create icon ) Import YAML, and create a CR similar to the following example llamastack-custom-distribution.yaml file:

    Example llamastack-custom-distribution.yaml

    apiVersion: llamastack.io/v1alpha1
    kind: LlamaStackDistribution
    metadata:
      name: llamastack-custom-distribution
      namespace: llamastack
    spec:
      replicas: 1
      server:
        containerSpec:
          env:
            - name: VLLM_URL
              value: 'https://llama32-3b.llamastack.svc.cluster.local/v1'
            - name: INFERENCE_MODEL
              value: llama32-3b
            - name: VLLM_TLS_VERIFY
              value: 'false'
            - name: POSTGRES_HOST
              value: <postgres-host>
            - name: POSTGRES_PORT
              value: '5432'
            - name: POSTGRES_DB
              value: llamastack
            - name: POSTGRES_USER
              value: llamastack
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: password
                  name: postgres-secret 
    1
    
          name: llama-stack
          port: 8321
        distribution:
          name: 'rh-dev'
        storage:
          size: 20Gi
          mountPath: <custom-mount-path> ## Defaults to /opt/app-root/src/.llama/distributions/rh/
    Copy to Clipboard Toggle word wrap

1
You can create this secret by running oc create secret generic postgres-secret\ --from-literal=password=<custom-password> in your terminal.
  1. As the cluster administrator provisioning the Llama Stack server, you then need to create a Llama Stack PostgreSQL database and grant full permissions to a user.

    1. Open a terminal that has network access to the PostgreSQL instance and the PostgreSQL client installed.
    2. Start the PostgreSQL shell with the following command:

      $ psql
      Copy to Clipboard Toggle word wrap
    3. Create the database with the following command:

      CREATE DATABASE llamastack;
      Copy to Clipboard Toggle word wrap
    4. Create a user role called llamastack accessible with a custom password:

      $ CREATE ROLE llamastack WITH LOGIN PASSWORD <password-for-user-access>;
      Copy to Clipboard Toggle word wrap
    5. Grant full permissions on the database to the user with the following command:

      $ GRANT ALL PRIVILEGES ON DATABASE llamastack TO llamastack;
      Copy to Clipboard Toggle word wrap
    6. Connect to the new database by running the following command:

      $ \c llamastack
      Copy to Clipboard Toggle word wrap
    7. Grant table usage and creation privileges to the public schema:

      $ GRANT USAGE, CREATE ON SCHEMA public TO llamastack;
      Copy to Clipboard Toggle word wrap
    8. Ensure all future tables are automatically accessible by running the following commands:

      $ ALTER DEFAULT PRIVILEGES IN SCHEMA public
      $ GRANT ALL PRIVILEGES ON TABLES TO llamastack;
      Copy to Clipboard Toggle word wrap

Verification

  1. Check that the custom resource was created with the following command:

    $ oc get llamastackdistribution -n llamastack
    Copy to Clipboard Toggle word wrap
  2. Check the running pods with the following command:

    $ oc get pods -n llamastack | grep llamastack-custom-distribution
    Copy to Clipboard Toggle word wrap
  3. Check the logs with the following command:

    $ oc logs -n llamastack -l app=llama-stack
    Copy to Clipboard Toggle word wrap

    Example output

    INFO: Started server process
    INFO: Waiting for application startup.
    INFO: Application startup complete.
    INFO: Uvicorn running on http://['::', '0.0.0.0']:8321
    Copy to Clipboard Toggle word wrap

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top