Chapter 4. Using Llama Stack with TrustyAI


This section contains tutorials for working with Llama Stack in TrustyAI. These tutorials demonstrate how to use various Llama Stack components and providers to evaluate and work with language models.

The following sections describe how to work with Llama Stack and provide example use cases:

  • Using the Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
  • Running custom evaluations with LM-Eval Llama Stack external evaluation provider
  • Using the trustyai-fms Guardrails Orchestrator with Llama Stack

This example demonstrates how to evaluate a language model in Red Hat OpenShift AI using the LMEval Llama Stack external eval provider in a Python workbench. To do this, configure a Llama Stack server to use the LMEval eval provider, register a benchmark dataset, and run a benchmark evaluation job on a language model.

Prerequisites

  • You have installed Red Hat OpenShift AI, version 2.20 or later.
  • You have cluster administrator privileges for your OpenShift AI cluster.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

  • You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
  • You have installed TrustyAI Operator in your OpenShift AI cluster.
  • You have set KServe to Raw Deployment mode in your cluster.

Procedure

  1. Create and activate a Python virtual environment for this tutorial in your local machine:

    python3 -m venv .venv
    source .venv/bin/activate
    Copy to Clipboard Toggle word wrap
  2. Install the required packages from the Python Package Index (PyPI):

    pip install \
        llama-stack \
        llama-stack-client \
        llama-stack-provider-lmeval
    Copy to Clipboard Toggle word wrap
  3. Create the model route:

    oc create route edge vllm --service=<VLLM_SERVICE> --port=<VLLM_PORT> -n <MODEL_NAMESPACE>
    Copy to Clipboard Toggle word wrap
  4. Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the v1/completions endpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example:

    export TRUSTYAI_LM_EVAL_NAMESPACE=<MODEL_NAMESPACE>
    export MODEL_ROUTE=$(oc get route -n "$TRUSTYAI_LM_EVAL_NAMESPACE" | awk '/predictor/{print $2; exit}')
    export VLLM_URL="https://${MODEL_ROUTE}/v1/completions"
    Copy to Clipboard Toggle word wrap
  5. Download the providers.d provider configuration directory and the run.yaml execution file:

    curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml
    
    curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yaml
    Copy to Clipboard Toggle word wrap
  6. Start the Llama Stack server in a virtual environment, which uses port 8321 by default:

    llama stack run run.yaml --image-type venv
    Copy to Clipboard Toggle word wrap
  7. Create a Python script in a Jupyter workbench and import the following libraries and modules, to interact with the server and run an evaluation:

    import os
    import subprocess
    
    import logging
    
    import time
    import pprint
    Copy to Clipboard Toggle word wrap
  8. Start the Llama Stack Python client to interact with the running Llama Stack server:

    BASE_URL = "http://localhost:8321"
    
    def create_http_client():
        from llama_stack_client import LlamaStackClient
        return LlamaStackClient(base_url=BASE_URL)
    
    client = create_http_client()
    Copy to Clipboard Toggle word wrap
  9. Print a list of the current available benchmarks:

    benchmarks = client.benchmarks.list()
    
    pprint.pprint(f"Available benchmarks: {benchmarks}")
    Copy to Clipboard Toggle word wrap
  10. LMEval provides access to over 100 preconfigured evaluation datasets. Register the ARC-Easy benchmark, a dataset of grade-school level, multiple-choice science questions:

    client.benchmarks.register(
        benchmark_id="trustyai_lmeval::arc_easy",
        dataset_id="trustyai_lmeval::arc_easy",
        scoring_functions=["string"],
        provider_benchmark_id="string",
        provider_id="trustyai_lmeval",
         metadata={
            "tokenizer": "google/flan-t5-small",
            "tokenized_requests": False,
       }
    )
    Copy to Clipboard Toggle word wrap
  11. Verify that the benchmark has been registered successfully:

    benchmarks = client.benchmarks.list()
    pprint.print(f"Available benchmarks: {benchmarks}")
    Copy to Clipboard Toggle word wrap
  12. Run a benchmark evaluation job on your deployed model using the following input. Replace phi-3 with the name of your deployed model:

    job = client.eval.run_eval(
        benchmark_id="trustyai_lmeval::arc_easy",
        benchmark_config={
            "eval_candidate": {
                "type": "model",
                "model": "phi-3",
                "provider_id": "trustyai_lmeval",
                "sampling_params": {
                    "temperature": 0.7,
                    "top_p": 0.9,
                    "max_tokens": 256
                },
            },
            "num_examples": 1000,
         },
    )
    
    print(f"Starting job '{job.job_id}'")
    Copy to Clipboard Toggle word wrap
  13. Monitor the status of the evaluation job using the following code. The job will run asynchronously, so you can check its status periodically:
def get_job_status(job_id, benchmark_id):
    return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy")
    print(job)

    if job.status in ['failed', 'completed']:
        print(f"Job ended with status: {job.status}")
        break

    time.sleep(20)
Copy to Clipboard Toggle word wrap
  1. Retrieve the evaluation job results once the job status reports back as completed:

    pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)
    Copy to Clipboard Toggle word wrap

This example demonstrates how to use the LM-Eval Llama Stack external eval provider to evaluate a language model with a custom benchmark. Creating a custom benchmark is useful for evaluating specific model knowledge and behavior.

The process involves three steps:

  • Uploading the task dataset to your OpenShift AI cluster
  • Registering it as a custom benchmark dataset with Llama Stack
  • Running a benchmark evaluation job on a language model

Prerequisites

  • You have installed Red Hat OpenShift AI, version 2.20 or later.
  • You have cluster administrator privileges for your OpenShift AI cluster.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

  • You have a large language model (LLM) for chat generation or text classification, or both, deployed on vLLM Serving Runtime in your OpenShift AI cluster.
  • You have installed TrustyAI Operator in your OpenShift AI cluster.
  • You have set KServe to Raw Deployment mode in your cluster.

Procedure

  1. Upload your custom benchmark dataset to your OpenShift cluster using a PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named my-pvc to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:

    oc apply -n <MODEL_NAMESPACE> -f - << EOF
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
        name: my-pvc
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
    EOF
    Copy to Clipboard Toggle word wrap
  2. Create a pod object named dataset-storage-pod to download the task dataset into the PVC. This pod is used to copy your dataset from your local machine to the OpenShift AI cluster:

    oc apply -n <MODEL_NAMESPACE> -f - << EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: dataset-storage-pod
    spec:
      containers:
      - name: dataset-container
        image: 'quay.io/prometheus/busybox:latest'
        command: ["/bin/sh", "-c", "sleep 3600"]
        volumeMounts:
        - mountPath: "/data/upload_files"
          name: dataset-storage
      volumes:
      - name: dataset-storage
        persistentVolumeClaim:
          claimName: my-pvc
    EOF
    Copy to Clipboard Toggle word wrap
  3. Copy your locally stored task dataset to the pod to place it within the PVC. In this example, the dataset is named example-dk-bench-input-bmo.jsonl locally and it is copied to the dataset-storage-pod under the path /data/upload_files/.

    oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>
    Copy to Clipboard Toggle word wrap
  4. Once the custom dataset is uploaded to the PVC, register it as a benchmark for evaluations. At a minimum, provide the following metadata and replace the DK_BENCH_DATASET_PATH and any other metadata fields to match your specific configuration:

    1. The TrustyAI LM-Eval Tasks GitHub web address
    2. Your branch
    3. The commit hash and path of the custom task.

      client.benchmarks.register(
          benchmark_id="trustyai_lmeval::dk-bench",
          dataset_id="trustyai_lmeval::dk-bench",
          scoring_functions=["accuracy"],
          provider_benchmark_id="dk-bench",
          provider_id="trustyai_lmeval",
          metadata={
              "custom_task": {
                  "git": {
                      "url": "https://github.com/trustyai-explainability/lm-eval-tasks.git",
                      "branch": "main",
                      "commit": "8220e2d73c187471acbe71659c98bccecfe77958",
                      "path": "tasks/",
                  }
              },
              "env": {
                  # Path of the dataset inside the PVC
                  "DK_BENCH_DATASET_PATH": "/opt/app-root/src/hf_home/example-dk-bench-input-bmo.jsonl",
                  "JUDGE_MODEL_URL": "http://phi-3-predictor:8080/v1/chat/completions",
                  # For simplicity, we use the same model as the one being evaluated
                  "JUDGE_MODEL_NAME": "phi-3",
                  "JUDGE_API_KEY": "",
              },
              "tokenized_requests": False,
              "tokenizer": "google/flan-t5-small",
              "input": {"storage": {"pvc": "my-pvc"}}
          },
      )
      Copy to Clipboard Toggle word wrap
  5. Run a benchmark evaluation on your model:

    job = client.eval.run_eval(
        benchmark_id="trustyai_lmeval::dk-bench",
        benchmark_config={
            "eval_candidate": {
                "type": "model",
                "model": "phi-3",
                "provider_id": "trustyai_lmeval",
                "sampling_params": {
                    "temperature": 0.7,
                    "top_p": 0.9,
                    "max_tokens": 256
                },
            },
            "num_examples": 1000,
         },
    )
    
    print(f"Starting job '{job.job_id}'")
    Copy to Clipboard Toggle word wrap
  6. Monitor the status of the evaluation job. The job runs asynchronously, so you can check its status periodically:

    import time
    def get_job_status(job_id, benchmark_id):
        return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
    
    while True:
        job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
        print(job)
    
        if job.status in ['failed', 'completed']:
            print(f"Job ended with status: {job.status}")
            break
    
        time.sleep(20)
    Copy to Clipboard Toggle word wrap

The trustyai_fms Orchestrator server is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API. This implementation of Llama Stack combines Guardrails Orchestrator with a suite of community-developed detectors to provide robust content filtering and safety monitoring.

This example demonstrates how to use the built-in Guardrails Regex Detector to detect personally identifiable information (PII) with Guardrails Orchestrator as Llama Stack safety guardrails, using the LlamaStack Operator to deploy a distribution in your Red Hat OpenShift AI namespace.

Note

Guardrails Orchestrator with Llama Stack is not supported on s390x, as it requires the LlamaStack Operator, which is currently unavailable for this architecture.

Prerequisites

  • You have cluster administrator privileges for your OpenShift cluster.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

  • You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
  • A cluster administrator has installed the following Operators in OpenShift:

    • Red Hat Authorino Operator, version 1.2.1 or later
    • Red Hat OpenShift Service Mesh, version 2.6.7-0 or later

Procedure

  1. Configure your OpenShift AI environment with the following configurations in the DataScienceCluster. Note that you must manually update the spec.llamastack.managementState field to Managed:

    spec:
      trustyai:
        managementState: Managed
      llamastack:
        managementState: Managed
      kserve:
        defaultDeploymentMode: RawDeployment
        managementState: Managed
        nim:
          managementState: Managed
        rawDeploymentServiceConfig: Headless
      serving:
        ingressGateway:
          certificate:
            type: OpenshiftDefaultIngress
        managementState: Removed
        name: knative-serving
      serviceMesh:
        managementState: Removed
    Copy to Clipboard Toggle word wrap
  2. Create a project in your OpenShift AI namespace:

    PROJECT_NAME="lls-minimal-example"
    oc new-project $PROJECT_NAME
    Copy to Clipboard Toggle word wrap
  3. Deploy the Guardrails Orchestrator with regex detectors by applying the Orchestrator configuration for regex-based PII detection:

    cat <<EOF | oc apply -f -
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: fms-orchestr8-config-nlp
    data:
      config.yaml: |
        detectors:
          regex:
            type: text_contents
            service:
              hostname: "127.0.0.1"
              port: 8080
            chunker_id: whole_doc_chunker
            default_threshold: 0.5
    ---
    apiVersion: trustyai.opendatahub.io/v1alpha1
    kind: GuardrailsOrchestrator
    metadata:
      name: guardrails-orchestrator
    spec:
      orchestratorConfig: "fms-orchestr8-config-nlp"
      enableBuiltInDetectors: true
      enableGuardrailsGateway: false
      replicas: 1
    EOF
    Copy to Clipboard Toggle word wrap
  4. In the same namespace, create a Llama Stack distribution:

    apiVersion: llamastack.io/v1alpha1
    kind: LlamaStackDistribution
    metadata:
      name: llamastackdistribution-sample
      namespace: <PROJECT_NAMESPACE>
    spec:
      replicas: 1
      server:
        containerSpec:
          env:
            - name: VLLM_URL
              value: '${VLLM_URL}'
            - name: INFERENCE_MODEL
              value: '${INFERENCE_MODEL}'
            - name: MILVUS_DB_PATH
              value: '~/.llama/milvus.db'
            - name: VLLM_TLS_VERIFY
              value: 'false'
            - name: FMS_ORCHESTRATOR_URL
              value: '${FMS_ORCHESTRATOR_URL}'
          name: llama-stack
          port: 8321
        distribution:
          name: rh-dev
        storage:
          size: 20Gi
    Copy to Clipboard Toggle word wrap
Note

 —  After deploying the LlamaStackDistribution CR, a new pod is created in the same namespace. This pod runs the LlamaStack server for your distribution.  — 

  1. Once the Llama Stack server is running, use the /v1/shields endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII).
  2. Open a port-forward to access it locally:

    oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321
    Copy to Clipboard Toggle word wrap
  3. Use the /v1/shields endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII):

    curl -X POST http://localhost:8321/v1/shields \
      -H 'Content-Type: application/json' \
      -d '{
        "shield_id": "regex_detector",
        "provider_shield_id": "regex_detector",
        "provider_id": "trustyai_fms",
        "params": {
          "type": "content",
          "confidence_threshold": 0.5,
          "message_types": ["system", "user"],
          "detectors": {
            "regex": {
              "detector_params": {
                "regex": ["email", "us-social-security-number", "credit-card"]
              }
            }
          }
        }
      }'
    Copy to Clipboard Toggle word wrap
  4. Verify that the shield was registered:

    curl -s http://localhost:8321/v1/shields | jq '.'
    Copy to Clipboard Toggle word wrap
  5. The following output indicates that the shield has been registered successfully:

    {
      "data": [
        {
          "identifier": "regex_detector",
          "provider_resource_id": "regex_detector",
          "provider_id": "trustyai_fms",
          "type": "shield",
          "params": {
            "type": "content",
            "confidence_threshold": 0.5,
            "message_types": [
              "system",
              "user"
            ],
            "detectors": {
              "regex": {
                "detector_params": {
                  "regex": [
                    "email",
                    "us-social-security-number",
                    "credit-card"
                  ]
                }
              }
            }
          }
        }
      ]
    }
    Copy to Clipboard Toggle word wrap
  6. Once the shield has been registered, verify that it is working by sending a message containing PII to the /v1/safety/run-shield endpoint:

    1. Email detection example:

      curl -X POST http://localhost:8321/v1/safety/run-shield \
      -H "Content-Type: application/json" \
      -d '{
        "shield_id": "regex_detector",
        "messages": [
          {
            "content": "My email is test@example.com",
            "role": "user"
          }
        ]
      }' | jq '.'
      Copy to Clipboard Toggle word wrap

      This should return a response indicating that the email was detected:

      {
        "violation": {
          "violation_level": "error",
          "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
          "metadata": {
            "status": "violation",
            "shield_id": "regex_detector",
            "confidence_threshold": 0.5,
            "summary": {
              "total_messages": 1,
              "processed_messages": 1,
              "skipped_messages": 0,
              "messages_with_violations": 1,
              "messages_passed": 0,
              "message_fail_rate": 1.0,
              "message_pass_rate": 0.0,
              "total_detections": 1,
              "detector_breakdown": {
                "active_detectors": 1,
                "total_checks_performed": 1,
                "total_violations_found": 1,
                "violations_per_message": 1.0
              }
            },
            "results": [
              {
                "message_index": 0,
                "text": "My email is test@example.com",
                "status": "violation",
                "score": 1.0,
                "detection_type": "pii",
                "individual_detector_results": [
                  {
                    "detector_id": "regex",
                    "status": "violation",
                    "score": 1.0,
                    "detection_type": "pii"
                  }
                ]
              }
            ]
          }
        }
      }
      Copy to Clipboard Toggle word wrap
    2. Social security number (SSN) detection example:

      curl -X POST http://localhost:8321/v1/safety/run-shield \
      -H "Content-Type: application/json" \
      -d '{
          "shield_id": "regex_detector",
          "messages": [
            {
              "content": "My SSN is 123-45-6789",
              "role": "user"
            }
          ]
      }' | jq '.'
      Copy to Clipboard Toggle word wrap

      This should return a response indicating that the SSN was detected:

      {
        "violation": {
          "violation_level": "error",
          "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
          "metadata": {
            "status": "violation",
            "shield_id": "regex_detector",
            "confidence_threshold": 0.5,
            "summary": {
              "total_messages": 1,
              "processed_messages": 1,
              "skipped_messages": 0,
              "messages_with_violations": 1,
              "messages_passed": 0,
              "message_fail_rate": 1.0,
              "message_pass_rate": 0.0,
              "total_detections": 1,
              "detector_breakdown": {
                "active_detectors": 1,
                "total_checks_performed": 1,
                "total_violations_found": 1,
                "violations_per_message": 1.0
              }
            },
            "results": [
              {
                "message_index": 0,
                "text": "My SSN is 123-45-6789",
                "status": "violation",
                "score": 1.0,
                "detection_type": "pii",
                "individual_detector_results": [
                  {
                    "detector_id": "regex",
                    "status": "violation",
                    "score": 1.0,
                    "detection_type": "pii"
                  }
                ]
              }
            ]
          }
        }
      }
      Copy to Clipboard Toggle word wrap
    3. Credit card detection example:

      curl -X POST http://localhost:8321/v1/safety/run-shield \
      -H "Content-Type: application/json" \
      -d '{
          "shield_id": "regex_detector",
          "messages": [
            {
              "content": "My credit card number is 4111-1111-1111-1111",
              "role": "user"
            }
          ]
      }' | jq '.'
      Copy to Clipboard Toggle word wrap

      This should return a response indicating that the credit card number was detected:

      {
        "violation": {
          "violation_level": "error",
          "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
          "metadata": {
            "status": "violation",
            "shield_id": "regex_detector",
            "confidence_threshold": 0.5,
            "summary": {
              "total_messages": 1,
              "processed_messages": 1,
              "skipped_messages": 0,
              "messages_with_violations": 1,
              "messages_passed": 0,
              "message_fail_rate": 1.0,
              "message_pass_rate": 0.0,
              "total_detections": 1,
              "detector_breakdown": {
                "active_detectors": 1,
                "total_checks_performed": 1,
                "total_violations_found": 1,
                "violations_per_message": 1.0
              }
            },
            "results": [
              {
                "message_index": 0,
                "text": "My credit card number is 4111-1111-1111-1111",
                "status": "violation",
                "score": 1.0,
                "detection_type": "pii",
                "individual_detector_results": [
                  {
                    "detector_id": "regex",
                    "status": "violation",
                    "score": 1.0,
                    "detection_type": "pii"
                  }
                ]
              }
            ]
          }
        }
      }
      Copy to Clipboard Toggle word wrap
Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat