Este conteúdo não está disponível no idioma selecionado.
Chapter 8. Using Llama Stack with TrustyAI
This section contains tutorials for working with Llama Stack in TrustyAI. These tutorials demonstrate how to use various Llama Stack components and providers to evaluate and work with language models.
The following sections describe how to work with Llama Stack and provide example use cases:
- Using the Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
- Running custom evaluations with LM-Eval Llama Stack external evaluation provider
- Using the trustyai-fms Guardrails Orchestrator with Llama Stack
8.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI Copiar o linkLink copiado para a área de transferência!
This example demonstrates how to evaluate a language model in Red Hat OpenShift AI using the LMEval Llama Stack external eval provider in a Python workbench. To do this, configure a Llama Stack server to use the LMEval eval provider, register a benchmark dataset, and run a benchmark evaluation job on a language model.
Prerequisites
- You have installed Red Hat OpenShift AI, version 2.20 or later.
- You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
- You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
- You have installed TrustyAI Operator in your OpenShift AI cluster.
- You have set KServe to Raw Deployment mode in your cluster.
Procedure
Create and activate a Python virtual environment for this tutorial in your local machine:
python3 -m venv .venv source .venv/bin/activateInstall the required packages from the Python Package Index (PyPI):
pip install \ llama-stack \ llama-stack-client \ llama-stack-provider-lmevalCreate the model route:
oc create route edge vllm --service=<VLLM_SERVICE> --port=<VLLM_PORT> -n <MODEL_NAMESPACE>Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the
v1/completionsendpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example:export TRUSTYAI_LM_EVAL_NAMESPACE=<MODEL_NAMESPACE> export MODEL_ROUTE=$(oc get route -n "$TRUSTYAI_LM_EVAL_NAMESPACE" | awk '/predictor/{print $2; exit}') export VLLM_URL="https://${MODEL_ROUTE}/v1/completions"Download the
providers.dprovider configuration directory and therun.yamlexecution file:curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yamlStart the Llama Stack server in a virtual environment, which uses port
8321by default:llama stack run run.yaml --image-type venvCreate a Python script in a Jupyter workbench and import the following libraries and modules, to interact with the server and run an evaluation:
import os import subprocess import logging import time import pprintStart the Llama Stack Python client to interact with the running Llama Stack server:
BASE_URL = "http://localhost:8321" def create_http_client(): from llama_stack_client import LlamaStackClient return LlamaStackClient(base_url=BASE_URL) client = create_http_client()Print a list of the current available benchmarks:
benchmarks = client.benchmarks.list() pprint.pprint(f"Available benchmarks: {benchmarks}")LMEval provides access to over 100 preconfigured evaluation datasets. Register the ARC-Easy benchmark, a dataset of grade-school level, multiple-choice science questions:
client.benchmarks.register( benchmark_id="trustyai_lmeval::arc_easy", dataset_id="trustyai_lmeval::arc_easy", scoring_functions=["string"], provider_benchmark_id="string", provider_id="trustyai_lmeval", metadata={ "tokenizer": "google/flan-t5-small", "tokenized_requests": False, } )Verify that the benchmark has been registered successfully:
benchmarks = client.benchmarks.list() pprint.print(f"Available benchmarks: {benchmarks}")Run a benchmark evaluation job on your deployed model using the following input. Replace phi-3 with the name of your deployed model:
job = client.eval.run_eval( benchmark_id="trustyai_lmeval::arc_easy", benchmark_config={ "eval_candidate": { "type": "model", "model": "phi-3", "provider_id": "trustyai_lmeval", "sampling_params": { "temperature": 0.7, "top_p": 0.9, "max_tokens": 256 }, }, "num_examples": 1000, }, ) print(f"Starting job '{job.job_id}'")- Monitor the status of the evaluation job using the following code. The job will run asynchronously, so you can check its status periodically:
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy")
print(job)
if job.status in ['failed', 'completed']:
print(f"Job ended with status: {job.status}")
break
time.sleep(20)
Retrieve the evaluation job results once the job status reports back as
completed:pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)
8.2. Running custom evaluations with LM-Eval and Llama Stack Copiar o linkLink copiado para a área de transferência!
This example demonstrates how to use the LM-Eval Llama Stack external eval provider to evaluate a language model with a custom benchmark. Creating a custom benchmark is useful for evaluating specific model knowledge and behavior.
The process involves three steps:
- Uploading the task dataset to your OpenShift AI cluster
- Registering it as a custom benchmark dataset with Llama Stack
- Running a benchmark evaluation job on a language model
Prerequisites
- You have installed Red Hat OpenShift AI, version 2.20 or later.
- You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
- You have a large language model (LLM) for chat generation or text classification, or both, deployed on vLLM Serving Runtime in your OpenShift AI cluster.
- You have installed TrustyAI Operator in your OpenShift AI cluster.
- You have set KServe to Raw Deployment mode in your cluster.
Procedure
Upload your custom benchmark dataset to your OpenShift cluster using a PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named
my-pvcto store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:oc apply -n <MODEL_NAMESPACE> -f - << EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi EOFCreate a pod object named
dataset-storage-podto download the task dataset into the PVC. This pod is used to copy your dataset from your local machine to the OpenShift AI cluster:oc apply -n <MODEL_NAMESPACE> -f - << EOF apiVersion: v1 kind: Pod metadata: name: dataset-storage-pod spec: containers: - name: dataset-container image: 'quay.io/prometheus/busybox:latest' command: ["/bin/sh", "-c", "sleep 3600"] volumeMounts: - mountPath: "/data/upload_files" name: dataset-storage volumes: - name: dataset-storage persistentVolumeClaim: claimName: my-pvc EOFCopy your locally stored task dataset to the pod to place it within the PVC. In this example, the dataset is named
example-dk-bench-input-bmo.jsonllocally and it is copied to thedataset-storage-podunder the path/data/upload_files/.oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>Once the custom dataset is uploaded to the PVC, register it as a benchmark for evaluations. At a minimum, provide the following metadata and replace the
DK_BENCH_DATASET_PATHand any other metadata fields to match your specific configuration:- The TrustyAI LM-Eval Tasks GitHub web address
- Your branch
The commit hash and path of the custom task.
client.benchmarks.register( benchmark_id="trustyai_lmeval::dk-bench", dataset_id="trustyai_lmeval::dk-bench", scoring_functions=["accuracy"], provider_benchmark_id="dk-bench", provider_id="trustyai_lmeval", metadata={ "custom_task": { "git": { "url": "https://github.com/trustyai-explainability/lm-eval-tasks.git", "branch": "main", "commit": "8220e2d73c187471acbe71659c98bccecfe77958", "path": "tasks/", } }, "env": { # Path of the dataset inside the PVC "DK_BENCH_DATASET_PATH": "/opt/app-root/src/hf_home/example-dk-bench-input-bmo.jsonl", "JUDGE_MODEL_URL": "http://phi-3-predictor:8080/v1/chat/completions", # For simplicity, we use the same model as the one being evaluated "JUDGE_MODEL_NAME": "phi-3", "JUDGE_API_KEY": "", }, "tokenized_requests": False, "tokenizer": "google/flan-t5-small", "input": {"storage": {"pvc": "my-pvc"}} }, )
Run a benchmark evaluation on your model:
job = client.eval.run_eval( benchmark_id="trustyai_lmeval::dk-bench", benchmark_config={ "eval_candidate": { "type": "model", "model": "phi-3", "provider_id": "trustyai_lmeval", "sampling_params": { "temperature": 0.7, "top_p": 0.9, "max_tokens": 256 }, }, "num_examples": 1000, }, ) print(f"Starting job '{job.job_id}'")Monitor the status of the evaluation job. The job runs asynchronously, so you can check its status periodically:
import time def get_job_status(job_id, benchmark_id): return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id) while True: job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench") print(job) if job.status in ['failed', 'completed']: print(f"Job ended with status: {job.status}") break time.sleep(20)
8.3. Using Guardrails Orchestrator with Llama Stack Copiar o linkLink copiado para a área de transferência!
The trustyai_fms orchestrator server is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API. This implementation of Llama Stack combines Guardrails Orchestrator with a suite of community-developed detectors to provide robust content filtering and safety monitoring.
This example demonstrates how to use the built-in Guardrails Regex Detector with Guardrails Orchestrator as Llama Stack safety guardrails, using the LlamaStack Operator to deploy a distribution in your Red Hat OpenShift AI namespace.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
- You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
A cluster administrator has installed the following Operators in OpenShift:
- Red Hat OpenShift Service Mesh, version 2.6.7-0 or later.
- Red Hat OpenShift Serverless, version 1.35.1 or later.
- Red Hat Authorino Operator, version 1.2.1 or later.
Procedure
Configure your OpenShift AI environment with the following configurations in the
DataScienceCluster. Note that you must manually update thespec.llamastack.managementStatefield toManaged:spec: trustyai: managementState: Managed llamastack: managementState: Managed kserve: defaultDeploymentMode: RawDeployment managementState: Managed nim: managementState: Managed rawDeploymentServiceConfig: Headless serving: ingressGateway: certificate: type: OpenshiftDefaultIngress managementState: Removed name: knative-serving serviceMesh: managementState: RemovedCreate a project in your OpenShift AI namespace:
PROJECT_NAME="lls-minimal-example" oc new-project $PROJECT_NAMEDeploy the Guardrails Orchestrator with regex detectors by applying the orchestrator configuration for regex-based PII detection:
cat <<EOF | oc apply -f - kind: ConfigMap apiVersion: v1 metadata: name: fms-orchestr8-config-nlp data: config.yaml: | detectors: regex: type: text_contents service: hostname: "127.0.0.1" port: 8080 chunker_id: whole_doc_chunker default_threshold: 0.5 --- apiVersion: trustyai.opendatahub.io/v1alpha1 kind: GuardrailsOrchestrator metadata: name: guardrails-orchestrator spec: orchestratorConfig: "fms-orchestr8-config-nlp" enableBuiltInDetectors: true enableGuardrailsGateway: false replicas: 1 EOFIn the same namespace, create a Llama Stack distribution:
apiVersion: llamastack.io/v1alpha1 kind: LlamaStackDistribution metadata: name: llamastackdistribution-sample namespace: <PROJECT_NAMESPACE> spec: replicas: 1 server: containerSpec: env: - name: VLLM_URL value: '${VLLM_URL}' - name: INFERENCE_MODEL value: '${INFERENCE_MODEL}' - name: MILVUS_DB_PATH value: '~/.llama/milvus.db' - name: VLLM_TLS_VERIFY value: 'false' - name: FMS_ORCHESTRATOR_URL value: '${FMS_ORCHESTRATOR_URL}' name: llama-stack port: 8321 distribution: name: rh-dev storage: size: 20Gi
— After deploying the LlamaStackDistribution CR, a new pod is created in the same namespace. This pod runs the LlamaStack server for your distribution. —
-
Once the Llama Stack server is running, use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII). Open a port-forward to access it locally:
oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321Use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII):curl -X POST http://localhost:8321/v1/shields \ -H 'Content-Type: application/json' \ -d '{ "shield_id": "regex_detector", "provider_shield_id": "regex_detector", "provider_id": "trustyai_fms", "params": { "type": "content", "confidence_threshold": 0.5, "message_types": ["system", "user"], "detectors": { "regex": { "detector_params": { "regex": ["email", "us-social-security-number", "credit-card"] } } } } }'Verify that the shield was registered:
curl -s http://localhost:8321/v1/shields | jq '.'The following output indicates that the shield has been registered successfully:
{ "data": [ { "identifier": "regex_detector", "provider_resource_id": "regex_detector", "provider_id": "trustyai_fms", "type": "shield", "params": { "type": "content", "confidence_threshold": 0.5, "message_types": [ "system", "user" ], "detectors": { "regex": { "detector_params": { "regex": [ "email", "us-social-security-number", "credit-card" ] } } } } } ] }Once the shield has been registered, verify that it is working by sending a message containing PII to the
/v1/safety/run-shieldendpoint:Email detection example:
curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "regex_detector", "messages": [ { "content": "My email is test@example.com", "role": "user" } ] }' | jq '.'This should return a response indicating that the email was detected:
{ "violation": { "violation_level": "error", "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)", "metadata": { "status": "violation", "shield_id": "regex_detector", "confidence_threshold": 0.5, "summary": { "total_messages": 1, "processed_messages": 1, "skipped_messages": 0, "messages_with_violations": 1, "messages_passed": 0, "message_fail_rate": 1.0, "message_pass_rate": 0.0, "total_detections": 1, "detector_breakdown": { "active_detectors": 1, "total_checks_performed": 1, "total_violations_found": 1, "violations_per_message": 1.0 } }, "results": [ { "message_index": 0, "text": "My email is test@example.com", "status": "violation", "score": 1.0, "detection_type": "pii", "individual_detector_results": [ { "detector_id": "regex", "status": "violation", "score": 1.0, "detection_type": "pii" } ] } ] } } }Social security number (SSN) detection example:
curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "regex_detector", "messages": [ { "content": "My SSN is 123-45-6789", "role": "user" } ] }' | jq '.'This should return a response indicating that the SSN was detected:
{ "violation": { "violation_level": "error", "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)", "metadata": { "status": "violation", "shield_id": "regex_detector", "confidence_threshold": 0.5, "summary": { "total_messages": 1, "processed_messages": 1, "skipped_messages": 0, "messages_with_violations": 1, "messages_passed": 0, "message_fail_rate": 1.0, "message_pass_rate": 0.0, "total_detections": 1, "detector_breakdown": { "active_detectors": 1, "total_checks_performed": 1, "total_violations_found": 1, "violations_per_message": 1.0 } }, "results": [ { "message_index": 0, "text": "My SSN is 123-45-6789", "status": "violation", "score": 1.0, "detection_type": "pii", "individual_detector_results": [ { "detector_id": "regex", "status": "violation", "score": 1.0, "detection_type": "pii" } ] } ] } } }Credit card detection example:
curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "regex_detector", "messages": [ { "content": "My credit card number is 4111-1111-1111-1111", "role": "user" } ] }' | jq '.'This should return a response indicating that the credit card number was detected:
{ "violation": { "violation_level": "error", "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)", "metadata": { "status": "violation", "shield_id": "regex_detector", "confidence_threshold": 0.5, "summary": { "total_messages": 1, "processed_messages": 1, "skipped_messages": 0, "messages_with_violations": 1, "messages_passed": 0, "message_fail_rate": 1.0, "message_pass_rate": 0.0, "total_detections": 1, "detector_breakdown": { "active_detectors": 1, "total_checks_performed": 1, "total_violations_found": 1, "violations_per_message": 1.0 } }, "results": [ { "message_index": 0, "text": "My credit card number is 4111-1111-1111-1111", "status": "violation", "score": 1.0, "detection_type": "pii", "individual_detector_results": [ { "detector_id": "regex", "status": "violation", "score": 1.0, "detection_type": "pii" } ] } ] } } }