Chapter 4. Using Llama Stack with TrustyAI
This section contains tutorials for working with Llama Stack in TrustyAI. These tutorials demonstrate how to use various Llama Stack components and providers to evaluate and work with language models.
The following sections describe how to work with Llama Stack and provide example use cases:
- Using the Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
- Running custom evaluations with LM-Eval Llama Stack external evaluation provider
- Using the trustyai-fms Guardrails Orchestrator with Llama Stack
4.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI Copy linkLink copied to clipboard!
This example demonstrates how to evaluate a language model in Red Hat OpenShift AI using the LMEval Llama Stack external eval provider in a Python workbench. To do this, configure a Llama Stack server to use the LMEval eval provider, register a benchmark dataset, and run a benchmark evaluation job on a language model.
Prerequisites
- You have installed Red Hat OpenShift AI, version 2.20 or later.
- You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
- You have installed TrustyAI Operator in your OpenShift AI cluster.
- You have set KServe to Raw Deployment mode in your cluster.
Procedure
Create and activate a Python virtual environment for this tutorial in your local machine:
python3 -m venv .venv source .venv/bin/activate
python3 -m venv .venv source .venv/bin/activateCopy to Clipboard Copied! Toggle word wrap Toggle overflow Install the required packages from the Python Package Index (PyPI):
pip install \ llama-stack \ llama-stack-client \ llama-stack-provider-lmevalpip install \ llama-stack \ llama-stack-client \ llama-stack-provider-lmevalCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the model route:
oc create route edge vllm --service=<VLLM_SERVICE> --port=<VLLM_PORT> -n <MODEL_NAMESPACE>
oc create route edge vllm --service=<VLLM_SERVICE> --port=<VLLM_PORT> -n <MODEL_NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the
v1/completionsendpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example:export TRUSTYAI_LM_EVAL_NAMESPACE=<MODEL_NAMESPACE> export MODEL_ROUTE=$(oc get route -n "$TRUSTYAI_LM_EVAL_NAMESPACE" | awk '/predictor/{print $2; exit}') export VLLM_URL="https://${MODEL_ROUTE}/v1/completions"export TRUSTYAI_LM_EVAL_NAMESPACE=<MODEL_NAMESPACE> export MODEL_ROUTE=$(oc get route -n "$TRUSTYAI_LM_EVAL_NAMESPACE" | awk '/predictor/{print $2; exit}') export VLLM_URL="https://${MODEL_ROUTE}/v1/completions"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the
providers.dprovider configuration directory and therun.yamlexecution file:curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yaml
curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the Llama Stack server in a virtual environment, which uses port
8321by default:llama stack run run.yaml --image-type venv
llama stack run run.yaml --image-type venvCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Python script in a Jupyter workbench and import the following libraries and modules, to interact with the server and run an evaluation:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start the Llama Stack Python client to interact with the running Llama Stack server:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Print a list of the current available benchmarks:
benchmarks = client.benchmarks.list() pprint.pprint(f"Available benchmarks: {benchmarks}")benchmarks = client.benchmarks.list() pprint.pprint(f"Available benchmarks: {benchmarks}")Copy to Clipboard Copied! Toggle word wrap Toggle overflow LMEval provides access to over 100 preconfigured evaluation datasets. Register the ARC-Easy benchmark, a dataset of grade-school level, multiple-choice science questions:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the benchmark has been registered successfully:
benchmarks = client.benchmarks.list() pprint.print(f"Available benchmarks: {benchmarks}")benchmarks = client.benchmarks.list() pprint.print(f"Available benchmarks: {benchmarks}")Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run a benchmark evaluation job on your deployed model using the following input. Replace phi-3 with the name of your deployed model:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Monitor the status of the evaluation job using the following code. The job will run asynchronously, so you can check its status periodically:
Retrieve the evaluation job results once the job status reports back as
completed:pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)
pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.2. Running custom evaluations with LM-Eval and Llama Stack Copy linkLink copied to clipboard!
This example demonstrates how to use the LM-Eval Llama Stack external eval provider to evaluate a language model with a custom benchmark. Creating a custom benchmark is useful for evaluating specific model knowledge and behavior.
The process involves three steps:
- Uploading the task dataset to your OpenShift AI cluster
- Registering it as a custom benchmark dataset with Llama Stack
- Running a benchmark evaluation job on a language model
Prerequisites
- You have installed Red Hat OpenShift AI, version 2.20 or later.
- You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have a large language model (LLM) for chat generation or text classification, or both, deployed on vLLM Serving Runtime in your OpenShift AI cluster.
- You have installed TrustyAI Operator in your OpenShift AI cluster.
- You have set KServe to Raw Deployment mode in your cluster.
Procedure
Upload your custom benchmark dataset to your OpenShift cluster using a PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named
my-pvcto store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a pod object named
dataset-storage-podto download the task dataset into the PVC. This pod is used to copy your dataset from your local machine to the OpenShift AI cluster:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy your locally stored task dataset to the pod to place it within the PVC. In this example, the dataset is named
example-dk-bench-input-bmo.jsonllocally and it is copied to thedataset-storage-podunder the path/data/upload_files/.oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>
oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the custom dataset is uploaded to the PVC, register it as a benchmark for evaluations. At a minimum, provide the following metadata and replace the
DK_BENCH_DATASET_PATHand any other metadata fields to match your specific configuration:- The TrustyAI LM-Eval Tasks GitHub web address
- Your branch
The commit hash and path of the custom task.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Run a benchmark evaluation on your model:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the status of the evaluation job. The job runs asynchronously, so you can check its status periodically:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. Detecting personally identifiable information (PII) by using Guardrails with Llama Stack Copy linkLink copied to clipboard!
The trustyai_fms Orchestrator server is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API. This implementation of Llama Stack combines Guardrails Orchestrator with a suite of community-developed detectors to provide robust content filtering and safety monitoring.
This example demonstrates how to use the built-in Guardrails Regex Detector to detect personally identifiable information (PII) with Guardrails Orchestrator as Llama Stack safety guardrails, using the LlamaStack Operator to deploy a distribution in your Red Hat OpenShift AI namespace.
Guardrails Orchestrator with Llama Stack is not supported on s390x, as it requires the LlamaStack Operator, which is currently unavailable for this architecture.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
A cluster administrator has installed the following Operators in OpenShift:
- Red Hat Authorino Operator, version 1.2.1 or later
- Red Hat OpenShift Service Mesh, version 2.6.7-0 or later
Procedure
Configure your OpenShift AI environment with the following configurations in the
DataScienceCluster. Note that you must manually update thespec.llamastack.managementStatefield toManaged:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a project in your OpenShift AI namespace:
PROJECT_NAME="lls-minimal-example" oc new-project $PROJECT_NAME
PROJECT_NAME="lls-minimal-example" oc new-project $PROJECT_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Guardrails Orchestrator with regex detectors by applying the Orchestrator configuration for regex-based PII detection:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the same namespace, create a Llama Stack distribution:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
— After deploying the LlamaStackDistribution CR, a new pod is created in the same namespace. This pod runs the LlamaStack server for your distribution. —
-
Once the Llama Stack server is running, use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII). Open a port-forward to access it locally:
oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321
oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII):Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the shield was registered:
curl -s http://localhost:8321/v1/shields | jq '.'
curl -s http://localhost:8321/v1/shields | jq '.'Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following output indicates that the shield has been registered successfully:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the shield has been registered, verify that it is working by sending a message containing PII to the
/v1/safety/run-shieldendpoint:Email detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the email was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Social security number (SSN) detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the SSN was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Credit card detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the credit card number was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow