Este conteúdo não está disponível no idioma selecionado.

Chapter 8. Using Llama Stack with TrustyAI

This section contains tutorials for working with Llama Stack in TrustyAI. These tutorials demonstrate how to use various Llama Stack components and providers to evaluate and work with language models.

The following sections describe how to work with Llama Stack and provide example use cases:

Using the Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
Running custom evaluations with LM-Eval Llama Stack external evaluation provider
Using the trustyai-fms Guardrails Orchestrator with Llama Stack

8.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
Copiar o link

This example demonstrates how to evaluate a language model in Red Hat OpenShift AI using the LMEval Llama Stack external eval provider in a Python workbench. To do this, configure a Llama Stack server to use the LMEval eval provider, register a benchmark dataset, and run a benchmark evaluation job on a language model.

Prerequisites

You have installed Red Hat OpenShift AI, version 2.20 or later.
You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
You have installed TrustyAI Operator in your OpenShift AI cluster.
You have set KServe to Raw Deployment mode in your cluster.

Procedure

Create and activate a Python virtual environment for this tutorial in your local machine:
```
python3 -m venv .venv
source .venv/bin/activate
```

Install the required packages from the Python Package Index (PyPI):

pip install \
    llama-stack \
    llama-stack-client \
    llama-stack-provider-lmeval

Create the model route:

oc create route edge vllm --service=<VLLM_SERVICE> --port=<VLLM_PORT> -n <MODEL_NAMESPACE>

Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the v1/completions endpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example:
```
export TRUSTYAI_LM_EVAL_NAMESPACE=<MODEL_NAMESPACE>
export MODEL_ROUTE=$(oc get route -n "$TRUSTYAI_LM_EVAL_NAMESPACE" | awk '/predictor/{print $2; exit}')
export VLLM_URL="https://${MODEL_ROUTE}/v1/completions"
```

Download the providers.d provider configuration directory and the run.yaml execution file:

curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml

curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yaml

Start the Llama Stack server in a virtual environment, which uses port 8321 by default:
```
llama stack run run.yaml --image-type venv
```
Create a Python script in a Jupyter workbench and import the following libraries and modules, to interact with the server and run an evaluation:
```
import os
import subprocess

import logging

import time
import pprint
```

Start the Llama Stack Python client to interact with the running Llama Stack server:

BASE_URL = "http://localhost:8321"

def create_http_client():
    from llama_stack_client import LlamaStackClient
    return LlamaStackClient(base_url=BASE_URL)

client = create_http_client()

Print a list of the current available benchmarks:

benchmarks = client.benchmarks.list()

pprint.pprint(f"Available benchmarks: {benchmarks}")

LMEval provides access to over 100 preconfigured evaluation datasets. Register the ARC-Easy benchmark, a dataset of grade-school level, multiple-choice science questions:

client.benchmarks.register(
    benchmark_id="trustyai_lmeval::arc_easy",
    dataset_id="trustyai_lmeval::arc_easy",
    scoring_functions=["string"],
    provider_benchmark_id="string",
    provider_id="trustyai_lmeval",
     metadata={
        "tokenizer": "google/flan-t5-small",
        "tokenized_requests": False,
   }
)

Verify that the benchmark has been registered successfully:

benchmarks = client.benchmarks.list()
pprint.print(f"Available benchmarks: {benchmarks}")

Run a benchmark evaluation job on your deployed model using the following input. Replace phi-3 with the name of your deployed model:

job = client.eval.run_eval(
    benchmark_id="trustyai_lmeval::arc_easy",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "phi-3",
            "provider_id": "trustyai_lmeval",
            "sampling_params": {
                "temperature": 0.7,
                "top_p": 0.9,
                "max_tokens": 256
            },
        },
        "num_examples": 1000,
     },
)

print(f"Starting job '{job.job_id}'")

Monitor the status of the evaluation job using the following code. The job will run asynchronously, so you can check its status periodically:

def get_job_status(job_id, benchmark_id):
    return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy")
    print(job)

    if job.status in ['failed', 'completed']:
        print(f"Job ended with status: {job.status}")
        break

    time.sleep(20)

Retrieve the evaluation job results once the job status reports back as completed:

pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)

8.2. Running custom evaluations with LM-Eval and Llama Stack
Copiar o link

This example demonstrates how to use the LM-Eval Llama Stack external eval provider to evaluate a language model with a custom benchmark. Creating a custom benchmark is useful for evaluating specific model knowledge and behavior.

The process involves three steps:

Uploading the task dataset to your OpenShift AI cluster
Registering it as a custom benchmark dataset with Llama Stack
Running a benchmark evaluation job on a language model

Prerequisites

You have installed Red Hat OpenShift AI, version 2.20 or later.
You have cluster administrator privileges for your OpenShift AI cluster.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
You have a large language model (LLM) for chat generation or text classification, or both, deployed on vLLM Serving Runtime in your OpenShift AI cluster.
You have installed TrustyAI Operator in your OpenShift AI cluster.
You have set KServe to Raw Deployment mode in your cluster.

Procedure

Upload your custom benchmark dataset to your OpenShift cluster using a PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named my-pvc to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:
```
oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
    name: my-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
EOF
```

Create a pod object named dataset-storage-pod to download the task dataset into the PVC. This pod is used to copy your dataset from your local machine to the OpenShift AI cluster:

oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: dataset-storage-pod
spec:
  containers:
  - name: dataset-container
    image: 'quay.io/prometheus/busybox:latest'
    command: ["/bin/sh", "-c", "sleep 3600"]
    volumeMounts:
    - mountPath: "/data/upload_files"
      name: dataset-storage
  volumes:
  - name: dataset-storage
    persistentVolumeClaim:
      claimName: my-pvc
EOF

Copy your locally stored task dataset to the pod to place it within the PVC. In this example, the dataset is named example-dk-bench-input-bmo.jsonl locally and it is copied to the dataset-storage-pod under the path /data/upload_files/.
```
oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>
```

Once the custom dataset is uploaded to the PVC, register it as a benchmark for evaluations. At a minimum, provide the following metadata and replace the DK_BENCH_DATASET_PATH and any other metadata fields to match your specific configuration:

The TrustyAI LM-Eval Tasks GitHub web address
Your branch

The commit hash and path of the custom task.

client.benchmarks.register(
    benchmark_id="trustyai_lmeval::dk-bench",
    dataset_id="trustyai_lmeval::dk-bench",
    scoring_functions=["accuracy"],
    provider_benchmark_id="dk-bench",
    provider_id="trustyai_lmeval",
    metadata={
        "custom_task": {
            "git": {
                "url": "https://github.com/trustyai-explainability/lm-eval-tasks.git",
                "branch": "main",
                "commit": "8220e2d73c187471acbe71659c98bccecfe77958",
                "path": "tasks/",
            }
        },
        "env": {
            # Path of the dataset inside the PVC
            "DK_BENCH_DATASET_PATH": "/opt/app-root/src/hf_home/example-dk-bench-input-bmo.jsonl",
            "JUDGE_MODEL_URL": "http://phi-3-predictor:8080/v1/chat/completions",
            # For simplicity, we use the same model as the one being evaluated
            "JUDGE_MODEL_NAME": "phi-3",
            "JUDGE_API_KEY": "",
        },
        "tokenized_requests": False,
        "tokenizer": "google/flan-t5-small",
        "input": {"storage": {"pvc": "my-pvc"}}
    },
)

Run a benchmark evaluation on your model:

job = client.eval.run_eval(
    benchmark_id="trustyai_lmeval::dk-bench",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "phi-3",
            "provider_id": "trustyai_lmeval",
            "sampling_params": {
                "temperature": 0.7,
                "top_p": 0.9,
                "max_tokens": 256
            },
        },
        "num_examples": 1000,
     },
)

print(f"Starting job '{job.job_id}'")

Monitor the status of the evaluation job. The job runs asynchronously, so you can check its status periodically:

import time
def get_job_status(job_id, benchmark_id):
    return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
    print(job)

    if job.status in ['failed', 'completed']:
        print(f"Job ended with status: {job.status}")
        break

    time.sleep(20)

8.3. Using Guardrails Orchestrator with Llama Stack
Copiar o link

The trustyai_fms orchestrator server is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API. This implementation of Llama Stack combines Guardrails Orchestrator with a suite of community-developed detectors to provide robust content filtering and safety monitoring.

This example demonstrates how to use the built-in Guardrails Regex Detector with Guardrails Orchestrator as Llama Stack safety guardrails, using the LlamaStack Operator to deploy a distribution in your Red Hat OpenShift AI namespace.

Prerequisites

You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Dedicated
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS (classic architecture)
You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
A cluster administrator has installed the following Operators in OpenShift:
- Red Hat OpenShift Service Mesh, version 2.6.7-0 or later.
- Red Hat OpenShift Serverless, version 1.35.1 or later.
- Red Hat Authorino Operator, version 1.2.1 or later.

Procedure

Configure your OpenShift AI environment with the following configurations in the DataScienceCluster. Note that you must manually update the spec.llamastack.managementState field to Managed:

spec:
  trustyai:
    managementState: Managed
  llamastack:
    managementState: Managed
  kserve:
    defaultDeploymentMode: RawDeployment
    managementState: Managed
    nim:
      managementState: Managed
    rawDeploymentServiceConfig: Headless
  serving:
    ingressGateway:
      certificate:
        type: OpenshiftDefaultIngress
    managementState: Removed
    name: knative-serving
  serviceMesh:
    managementState: Removed

Create a project in your OpenShift AI namespace:

PROJECT_NAME="lls-minimal-example"
oc new-project $PROJECT_NAME

Deploy the Guardrails Orchestrator with regex detectors by applying the orchestrator configuration for regex-based PII detection:

cat <<EOF | oc apply -f -
kind: ConfigMap
apiVersion: v1
metadata:
  name: fms-orchestr8-config-nlp
data:
  config.yaml: |
    detectors:
      regex:
        type: text_contents
        service:
          hostname: "127.0.0.1"
          port: 8080
        chunker_id: whole_doc_chunker
        default_threshold: 0.5
---
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: guardrails-orchestrator
spec:
  orchestratorConfig: "fms-orchestr8-config-nlp"
  enableBuiltInDetectors: true
  enableGuardrailsGateway: false
  replicas: 1
EOF

In the same namespace, create a Llama Stack distribution:

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llamastackdistribution-sample
  namespace: <PROJECT_NAMESPACE>
spec:
  replicas: 1
  server:
    containerSpec:
      env:
        - name: VLLM_URL
          value: '${VLLM_URL}'
        - name: INFERENCE_MODEL
          value: '${INFERENCE_MODEL}'
        - name: MILVUS_DB_PATH
          value: '~/.llama/milvus.db'
        - name: VLLM_TLS_VERIFY
          value: 'false'
        - name: FMS_ORCHESTRATOR_URL
          value: '${FMS_ORCHESTRATOR_URL}'
      name: llama-stack
      port: 8321
    distribution:
      name: rh-dev
    storage:
      size: 20Gi

Note

— After deploying the LlamaStackDistribution CR, a new pod is created in the same namespace. This pod runs the LlamaStack server for your distribution. —

Once the Llama Stack server is running, use the /v1/shields endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII).

Open a port-forward to access it locally:

oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321

Use the /v1/shields endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII):

curl -X POST http://localhost:8321/v1/shields \
  -H 'Content-Type: application/json' \
  -d '{
    "shield_id": "regex_detector",
    "provider_shield_id": "regex_detector",
    "provider_id": "trustyai_fms",
    "params": {
      "type": "content",
      "confidence_threshold": 0.5,
      "message_types": ["system", "user"],
      "detectors": {
        "regex": {
          "detector_params": {
            "regex": ["email", "us-social-security-number", "credit-card"]
          }
        }
      }
    }
  }'

Verify that the shield was registered:

curl -s http://localhost:8321/v1/shields | jq '.'

The following output indicates that the shield has been registered successfully:

{
  "data": [
    {
      "identifier": "regex_detector",
      "provider_resource_id": "regex_detector",
      "provider_id": "trustyai_fms",
      "type": "shield",
      "params": {
        "type": "content",
        "confidence_threshold": 0.5,
        "message_types": [
          "system",
          "user"
        ],
        "detectors": {
          "regex": {
            "detector_params": {
              "regex": [
                "email",
                "us-social-security-number",
                "credit-card"
              ]
            }
          }
        }
      }
    }
  ]
}

Once the shield has been registered, verify that it is working by sending a message containing PII to the /v1/safety/run-shield endpoint:

Email detection example:

curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
  "shield_id": "regex_detector",
  "messages": [
    {
      "content": "My email is test@example.com",
      "role": "user"
    }
  ]
}' | jq '.'

This should return a response indicating that the email was detected:

{
  "violation": {
    "violation_level": "error",
    "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
    "metadata": {
      "status": "violation",
      "shield_id": "regex_detector",
      "confidence_threshold": 0.5,
      "summary": {
        "total_messages": 1,
        "processed_messages": 1,
        "skipped_messages": 0,
        "messages_with_violations": 1,
        "messages_passed": 0,
        "message_fail_rate": 1.0,
        "message_pass_rate": 0.0,
        "total_detections": 1,
        "detector_breakdown": {
          "active_detectors": 1,
          "total_checks_performed": 1,
          "total_violations_found": 1,
          "violations_per_message": 1.0
        }
      },
      "results": [
        {
          "message_index": 0,
          "text": "My email is test@example.com",
          "status": "violation",
          "score": 1.0,
          "detection_type": "pii",
          "individual_detector_results": [
            {
              "detector_id": "regex",
              "status": "violation",
              "score": 1.0,
              "detection_type": "pii"
            }
          ]
        }
      ]
    }
  }
}

Social security number (SSN) detection example:

curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
    "shield_id": "regex_detector",
    "messages": [
      {
        "content": "My SSN is 123-45-6789",
        "role": "user"
      }
    ]
}' | jq '.'

This should return a response indicating that the SSN was detected:

{
  "violation": {
    "violation_level": "error",
    "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
    "metadata": {
      "status": "violation",
      "shield_id": "regex_detector",
      "confidence_threshold": 0.5,
      "summary": {
        "total_messages": 1,
        "processed_messages": 1,
        "skipped_messages": 0,
        "messages_with_violations": 1,
        "messages_passed": 0,
        "message_fail_rate": 1.0,
        "message_pass_rate": 0.0,
        "total_detections": 1,
        "detector_breakdown": {
          "active_detectors": 1,
          "total_checks_performed": 1,
          "total_violations_found": 1,
          "violations_per_message": 1.0
        }
      },
      "results": [
        {
          "message_index": 0,
          "text": "My SSN is 123-45-6789",
          "status": "violation",
          "score": 1.0,
          "detection_type": "pii",
          "individual_detector_results": [
            {
              "detector_id": "regex",
              "status": "violation",
              "score": 1.0,
              "detection_type": "pii"
            }
          ]
        }
      ]
    }
  }
}

Credit card detection example:

curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
    "shield_id": "regex_detector",
    "messages": [
      {
        "content": "My credit card number is 4111-1111-1111-1111",
        "role": "user"
      }
    ]
}' | jq '.'

This should return a response indicating that the credit card number was detected:

{
  "violation": {
    "violation_level": "error",
    "user_message": "Content violation detected by shield regex_detector (confidence: 1.00, 1/1 processed messages violated)",
    "metadata": {
      "status": "violation",
      "shield_id": "regex_detector",
      "confidence_threshold": 0.5,
      "summary": {
        "total_messages": 1,
        "processed_messages": 1,
        "skipped_messages": 0,
        "messages_with_violations": 1,
        "messages_passed": 0,
        "message_fail_rate": 1.0,
        "message_pass_rate": 0.0,
        "total_detections": 1,
        "detector_breakdown": {
          "active_detectors": 1,
          "total_checks_performed": 1,
          "total_violations_found": 1,
          "violations_per_message": 1.0
        }
      },
      "results": [
        {
          "message_index": 0,
          "text": "My credit card number is 4111-1111-1111-1111",
          "status": "violation",
          "score": 1.0,
          "detection_type": "pii",
          "individual_detector_results": [
            {
              "detector_id": "regex",
              "status": "violation",
              "score": 1.0,
              "detection_type": "pii"
            }
          ]
        }
      ]
    }
  }
}

Este conteúdo não está disponível no idioma selecionado.

Chapter 8. Using Llama Stack with TrustyAI

8.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
Copiar o link

8.2. Running custom evaluations with LM-Eval and Llama Stack
Copiar o link

8.3. Using Guardrails Orchestrator with Llama Stack
Copiar o link

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este conteúdo não está disponível no idioma selecionado.

Chapter 8. Using Llama Stack with TrustyAI

8.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAICopiar o linkLink copiado para a área de transferência!

8.2. Running custom evaluations with LM-Eval and Llama StackCopiar o linkLink copiado para a área de transferência!

8.3. Using Guardrails Orchestrator with Llama StackCopiar o linkLink copiado para a área de transferência!

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Using Llama Stack external evaluation provider with lm-evaluation-harness in TrustyAI
Copiar o link

8.2. Running custom evaluations with LM-Eval and Llama Stack
Copiar o link

8.3. Using Guardrails Orchestrator with Llama Stack
Copiar o link