Chapter 3. Deploying a Llama Stack server
Llama Stack allows you to create and deploy a server that enables various APIs for accessing AI services in your OpenShift AI cluster. You can create a LlamaStackDistribution custom resource for your desired use cases.
The included procedure provides an example LlamaStackDistribution CR that deploys a Llama Stack server that enables the following setup:
-
A connection to a vLLM inference service with a
llama32-3bmodel. - A connection to a remote vector database.
- An allocated persistent storage.
- Orchestration endpoints.
Prerequisites
- You have installed OpenShift 4.19 or newer.
- You have logged in to Red Hat OpenShift AI.
- You have cluster administrator privileges for your OpenShift cluster.
- You have activated the Llama Stack Operator in your cluster.
- You have installed the PostgreSQL Operator version 14 or later in your cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
Procedure
In the OpenShift web console, select Administrator
Quick Create (
) Import YAML, and create a CR similar to the following example llamastack-custom-distribution.yamlfile:Example llamastack-custom-distribution.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- 1
- You can create this secret by running
oc create secret generic postgres-secret\ --from-literal=password=<custom-password>in your terminal.As the cluster administrator provisioning the Llama Stack server, you then need to create a Llama Stack PostgreSQL database and grant full permissions to a user.
- Open a terminal that has network access to the PostgreSQL instance and the PostgreSQL client installed.
Start the PostgreSQL shell with the following command:
psql
$ psqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the database with the following command:
CREATE DATABASE llamastack;
CREATE DATABASE llamastack;Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a user role called
llamastackaccessible with a custom password:CREATE ROLE llamastack WITH LOGIN PASSWORD <password-for-user-access>;
$ CREATE ROLE llamastack WITH LOGIN PASSWORD <password-for-user-access>;Copy to Clipboard Copied! Toggle word wrap Toggle overflow Grant full permissions on the database to the user with the following command:
GRANT ALL PRIVILEGES ON DATABASE llamastack TO llamastack;
$ GRANT ALL PRIVILEGES ON DATABASE llamastack TO llamastack;Copy to Clipboard Copied! Toggle word wrap Toggle overflow Connect to the new database by running the following command:
\c llamastack
$ \c llamastackCopy to Clipboard Copied! Toggle word wrap Toggle overflow Grant table usage and creation privileges to the public schema:
GRANT USAGE, CREATE ON SCHEMA public TO llamastack;
$ GRANT USAGE, CREATE ON SCHEMA public TO llamastack;Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure all future tables are automatically accessible by running the following commands:
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO llamastack;
$ ALTER DEFAULT PRIVILEGES IN SCHEMA public $ GRANT ALL PRIVILEGES ON TABLES TO llamastack;Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the custom resource was created with the following command:
oc get llamastackdistribution -n llamastack
$ oc get llamastackdistribution -n llamastackCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the running pods with the following command:
oc get pods -n llamastack | grep llamastack-custom-distribution
$ oc get pods -n llamastack | grep llamastack-custom-distributionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the logs with the following command:
oc logs -n llamastack -l app=llama-stack
$ oc logs -n llamastack -l app=llama-stackCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
INFO: Started server process INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321
INFO: Started server process INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321Copy to Clipboard Copied! Toggle word wrap Toggle overflow