Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Evaluating RAG systems with Ragas

Important

Retrieval-Augmented Generation Assessment (Ragas) is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

As an AI engineer, you can use Retrieval-Augmented Generation Assessment (Ragas) to measure and improve the quality of your RAG systems in OpenShift AI. Ragas provides objective metrics that assess retrieval quality, answer relevance, and factual consistency, enabling you to identify issues, optimize configurations, and establish automated quality gates in your development workflows.

Ragas is integrated with OpenShift AI through the Llama Stack evaluation API and supports two deployment modes: an inline provider for development and testing, and a remote provider for production-scale evaluations using OpenShift AI pipelines.

3.1. About Ragas evaluation
Link kopieren

Ragas addresses the unique challenges of evaluating RAG systems by providing metrics that assess both the retrieval and generation components of your application. Unlike traditional language model evaluation that focuses solely on output quality, Ragas evaluates how well your system retrieves relevant context and generates responses grounded in that context.

3.1.1. Key Ragas metrics
Link kopieren

Ragas provides multiple metrics for evaluating RAG systems. Here are some of the metrics:

Faithfulness: Measures the generated answer to determine whether it is consistent with the retrieved context. A high faithfulness score indicates that the answer is well-grounded in the source documents, reducing the risk of hallucinations. This is critical for enterprise and regulated environments where accuracy and trustworthiness are paramount.
Answer Relevancy: Evaluates whether the generated answer is consistent with the input question. This metric ensures that your RAG system provides pertinent responses rather than generic or off-topic information.
Context Precision: Measures the precision of the retrieval component by evaluating whether the retrieved context chunks contain information relevant to answering the question. High precision indicates that your retrieval system is returning focused, relevant documents rather than irrelevant noise.
Context Recall: Measures the recall of the retrieval component by evaluating whether all necessary information for answering the question is present in the retrieved contexts. High recall ensures that your retrieval system is not missing important information.
Answer Correctness: Compares the generated answer with a ground truth reference answer to measure accuracy. This metric is useful when you have labeled evaluation datasets with known correct answers.
Answer Similarity: Measures the semantic similarity between the generated answer and a reference answer, providing a more nuanced assessment than exact string matching.

3.1.2. Use cases for Ragas in AI engineering workflows
Link kopieren

Ragas enables AI engineers to accomplish the following tasks:

Automate quality checks: Create reproducible, objective evaluation jobs that can be automatically triggered after every code commit or model update. Automatic quality checks establish quality gates to prevent regressions and ensure that you deploy only high-quality RAG configurations to production.
Enable evaluation-driven development (EDD): Use Ragas metrics to guide iterative optimization. For example, test different chunking strategies, embedding models, or retrieval algorithms against a defined benchmark. You can discover the optimal RAG configuration that maximizes performance metrics. For example, you can maximize faithfulness while minimizing computational cost.
Ensure factual consistency and trustworthiness: Measure the reliability of your RAG system by setting thresholds on metrics like faithfulness. Metrics thresholds ensure that responses are consistently grounded in source documents, which is critical for enterprise applications where hallucinations or factual errors are unacceptable.
Achieve production scalability: Leverage the remote provider pattern with OpenShift AI pipelines to execute evaluations as distributed jobs. The remote provider pattern allows you to run large-scale benchmarks across thousands of data points without blocking development or consuming excessive local resources.
Compare model and configuration variants: Run comparative evaluations across different models, retrieval strategies, or system configurations to make data-driven decisions about your RAG architecture. For example, compare the impact of different chunk sizes (512 vs 1024 tokens) or different embedding models on retrieval quality metrics.

3.1.3. Ragas provider deployment modes
Link kopieren

OpenShift AI supports two deployment modes for Ragas evaluation:

Inline provider

The inline provider mode runs Ragas evaluation in the same process as the Llama Stack server. Use the inline provider for development and rapid prototyping. It offers the following advantages:

Fast processing with in-memory operations
Minimal configuration overhead
Local development and testing
Evaluation of small to medium-sized datasets

Remote provider

The remote provider mode runs Ragas evaluation as distributed jobs using OpenShift AI pipelines (powered by Kubeflow Pipelines). Use the remote provider for production environments. It offers the following capabilities:

Running evaluations in parallel across thousands of data points
Providing resource isolation and management
Integrating with CI/CD pipelines for automated quality gates
Storing results in S3-compatible object storage
Tracking evaluation history and metrics over time
Supporting large-scale batch evaluations without impacting the Llama Stack server

3.2. Setting up the Ragas inline provider for development
Link kopieren

You can set up the Ragas inline provider to run evaluations directly within the Llama Stack server process. The inline provider is ideal for development environments, rapid prototyping, and lightweight evaluation workloads where simplicity and quick iteration are priorities.

Prerequisites

You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
You have activated the Llama Stack Operator in OpenShift AI.
You have deployed a Llama model with KServe. For more information, see Deploying a Llama model with KServe.
You have created a project.

Procedure

In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (oc) as shown in the following example:
```
oc login <openshift_cluster_url> -u <username> -p <password>
```
```
$ oc login <openshift_cluster_url> -u <username> -p <password>
```
Copy to Clipboard Toggle word wrap
Navigate to your project:
```
oc project <project_name>
```
```
$ oc project <project_name>
```
Copy to Clipboard Toggle word wrap
Create a ConfigMap for the Ragas inline provider configuration. For example, create a ragas-inline-config.yaml file as follows:
Example ragas-inline-config.yaml
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: ragas-inline-config
  namespace: <project_name>
data:
  EMBEDDING_MODEL: "all-MiniLM-L6-v2"
```
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: ragas-inline-config
  namespace: <project_name>
data:
  EMBEDDING_MODEL: "all-MiniLM-L6-v2"
```
Copy to Clipboard Toggle word wrap
- EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations. The all-MiniLM-L6-v2 model is a lightweight, efficient option suitable for most use cases.

Apply the ConfigMap:

oc apply -f ragas-inline-config.yaml

$ oc apply -f ragas-inline-config.yaml

Copy to Clipboard

Toggle word wrap

Create a Llama Stack distribution configuration file with the Ragas inline provider. For example, create a llama-stack-ragas-inline.yaml file as follows:

Example llama-stack-ragas-inline.yaml

apiVersion: llamastack.trustyai.opendatahub.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llama-stack-ragas-inline
  namespace: <project_name>
spec:
  replicas: 1
  server:
    containerSpec:
      env:
# ...
      - name: VLLM_URL
        value: <model_url>
      - name: VLLM_API_TOKEN
        value: <model_api_token (if necessary)>
      - name: INFERENCE_MODEL
        value: <model_name>
      - name: MILVUS_DB_PATH
        value: ~/.llama/milvus.db
      - name: VLLM_TLS_VERIFY
        value: "false"
      - name: FMS_ORCHESTRATOR_URL
        value: http://localhost:123
      - name: EMBEDDING_MODEL
        value: granite-embedding-125m
# ...

apiVersion: llamastack.trustyai.opendatahub.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llama-stack-ragas-inline
  namespace: <project_name>
spec:
  replicas: 1
  server:
    containerSpec:
      env:
# ...
      - name: VLLM_URL
        value: <model_url>
      - name: VLLM_API_TOKEN
        value: <model_api_token (if necessary)>
      - name: INFERENCE_MODEL
        value: <model_name>
      - name: MILVUS_DB_PATH
        value: ~/.llama/milvus.db
      - name: VLLM_TLS_VERIFY
        value: "false"
      - name: FMS_ORCHESTRATOR_URL
        value: http://localhost:123
      - name: EMBEDDING_MODEL
        value: granite-embedding-125m
# ...

Copy to Clipboard

Toggle word wrap

Deploy the Llama Stack distribution:
```
oc apply -f llama-stack-ragas-inline.yaml
```
```
$ oc apply -f llama-stack-ragas-inline.yaml
```
Copy to Clipboard Toggle word wrap
Wait for the deployment to complete:
```
oc get pods -w
```
```
$ oc get pods -w
```
Copy to Clipboard Toggle word wrap
Wait until the llama-stack-ragas-inline pod status shows Running.

3.3. Configuring the Ragas remote provider for production
Link kopieren

Next steps

You can configure the Ragas remote provider to run evaluations as distributed jobs using OpenShift AI pipelines. The remote provider enables production-scale evaluations by running Ragas in a separate Kubeflow Pipelines environment, providing resource isolation, improved scalability, and integration with CI/CD workflows.

Prerequisites

You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift AI Operator.
You have a DataScienceCluster custom resource in your environment; in the spec.components section the llamastackoperator.managementState is enabled with a value of Managed.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
You have configured a pipeline server in your project. For more information, see Configuring a pipeline server.
You have activated the Llama Stack Operator in OpenShift AI.
You have deployed a Large Language Model with KServe. For more information, see Deploying a Llama model with KServe.
You have configured S3-compatible object storage for storing evaluation results and you know your S3 credentials: AWS access key, AWS secret access key, and AWS default region. For more information, see Adding a connection to your project.
You have created a project.

Procedure

In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (oc) as shown in the following example:
```
oc login <openshift_cluster_url> -u <username> -p <password>
```
```
$ oc login <openshift_cluster_url> -u <username> -p <password>
```
Copy to Clipboard Toggle word wrap
Navigate to your project:
```
oc project <project_name>
```
```
$ oc project <project_name>
```
Copy to Clipboard Toggle word wrap
Create a secret for storing S3 credentials:
```
oc create secret generic "<ragas_s3_credentials>" \
  --from-literal=AWS_ACCESS_KEY_ID=<your_access_key> \
  --from-literal=AWS_SECRET_ACCESS_KEY=<your_secret_key> \
  --from-literal=AWS_DEFAULT_REGION=<your_region>
```
```
$ oc create secret generic "<ragas_s3_credentials>" \
  --from-literal=AWS_ACCESS_KEY_ID=<your_access_key> \
  --from-literal=AWS_SECRET_ACCESS_KEY=<your_secret_key> \
  --from-literal=AWS_DEFAULT_REGION=<your_region>
```
Copy to Clipboard Toggle word wrap
Important
Replace the placeholder values with your actual S3 credentials. These AWS credentials are required in two locations:
- In the Llama Stack server pod (as environment variables) - to access S3 when creating pipeline runs.
- In the Kubeflow Pipeline pods (via the secret) - to store evaluation results to S3 during pipeline execution.
The LlamaStackDistribution configuration loads these credentials from the "<ragas_s3_credentials>" secret and makes them available to both locations.
Create a secret for the Kubeflow Pipelines API token:
1. Get your token by running the following command:
  $ export KUBEFLOW_PIPELINES_TOKEN=$(oc whoami -t)
  Copy to Clipboard Toggle word wrap
2. Create the secret by running the following command:
  $ oc create secret generic kubeflow-pipelines-token \ --from-literal=KUBEFLOW_PIPELINES_TOKEN="$KUBEFLOW_PIPELINES_TOKEN"
  Copy to Clipboard Toggle word wrap
  Important
  The Llama Stack distribution service account does not have privileges to create pipeline runs. This secret provides the necessary authentication token for creating and managing pipeline runs.

Verify that the Kubeflow Pipelines endpoint is accessible:

curl -k -H "Authorization: Bearer $KUBEFLOW_PIPELINES_TOKEN" \
 https://$KUBEFLOW_PIPELINES_ENDPOINT/apis/v1beta1/healthz

$ curl -k -H "Authorization: Bearer $KUBEFLOW_PIPELINES_TOKEN" \
 https://$KUBEFLOW_PIPELINES_ENDPOINT/apis/v1beta1/healthz

Copy to Clipboard

Toggle word wrap

Create a secret for storing your inference model information:

export INFERENCE_MODEL="llama-3-2-3b"
export VLLM_URL="https://llama-32-3b-instruct-predictor:8443/v1"
export VLLM_TLS_VERIFY="false"  # Use "true" in production
export VLLM_API_TOKEN="<token_identifier>"

oc create secret generic llama-stack-inference-model-secret \
  --from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
  --from-literal VLLM_URL="$VLLM_URL" \
  --from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
  --from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"

$ export INFERENCE_MODEL="llama-3-2-3b"
$ export VLLM_URL="https://llama-32-3b-instruct-predictor:8443/v1"
$ export VLLM_TLS_VERIFY="false"  # Use "true" in production
$ export VLLM_API_TOKEN="<token_identifier>"

$ oc create secret generic llama-stack-inference-model-secret \
  --from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
  --from-literal VLLM_URL="$VLLM_URL" \
  --from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
  --from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"

Copy to Clipboard

Toggle word wrap

Get the Kubeflow Pipelines endpoint by running the following command and searching for "pipeline" in the routes. This is used in a later step for creating a ConfigMap for the Ragas remote provider configuration:
```
oc get routes -A | grep -i pipeline
```
```
$ oc get routes -A | grep -i pipeline
```
Copy to Clipboard Toggle word wrap
This output should show that the namespace, which is the namespace you specified for KUBEFLOW_NAMESPACE, has the pipeline server endpoint and the associated metadata one. The one to use is ds-pipeline-dspa.
Create a ConfigMap for the Ragas remote provider configuration. For example, create a kubeflow-ragas-config.yaml file as follows:
Example kubeflow-ragas-config.yaml
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubeflow-ragas-config
  namespace: <project_name>
data:
  EMBEDDING_MODEL: "all-MiniLM-L6-v2"
  KUBEFLOW_LLAMA_STACK_URL: "http://$<distribution_name>-service.$<your_namespace>.svc.cluster.local:$<port>"
  KUBEFLOW_PIPELINES_ENDPOINT: "https://<kfp_endpoint>"
  KUBEFLOW_NAMESPACE: "<project_name>"
  KUBEFLOW_BASE_IMAGE: "quay.io/rhoai/odh-trustyai-ragas-lls-provider-dsp-rhel9:rhoai-3.0"
  KUBEFLOW_RESULTS_S3_PREFIX: "s3://<bucket_name>/ragas-results"
  KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: "<ragas_s3_credentials>"
```
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubeflow-ragas-config
  namespace: <project_name>
data:
  EMBEDDING_MODEL: "all-MiniLM-L6-v2"
  KUBEFLOW_LLAMA_STACK_URL: "http://$<distribution_name>-service.$<your_namespace>.svc.cluster.local:$<port>"
  KUBEFLOW_PIPELINES_ENDPOINT: "https://<kfp_endpoint>"
  KUBEFLOW_NAMESPACE: "<project_name>"
  KUBEFLOW_BASE_IMAGE: "quay.io/rhoai/odh-trustyai-ragas-lls-provider-dsp-rhel9:rhoai-3.0"
  KUBEFLOW_RESULTS_S3_PREFIX: "s3://<bucket_name>/ragas-results"
  KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: "<ragas_s3_credentials>"
```
Copy to Clipboard Toggle word wrap
- EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations.
- KUBEFLOW_LLAMA_STACK_URL: The URL for the Llama Stack server. This must be accessible from the Kubeflow Pipeline pods. The <distribution_name>, <namespace>, and <port> are replaced with the name of the LlamaStack distribution you are creating, the namespace where you are creating it, and the port. These 3 elements are present in the LlamaStack distribution YAML.
- KUBEFLOW_PIPELINES_ENDPOINT: The Kubeflow Pipelines API endpoint URL.
- KUBEFLOW_NAMESPACE: The namespace where pipeline runs are executed. This should match your current project namespace.
- KUBEFLOW_BASE_IMAGE: The container image used for Ragas evaluation pipeline components. This image contains the Ragas provider package installed via pip.
- KUBEFLOW_RESULTS_S3_PREFIX: The S3 path prefix where evaluation results are stored. For example: s3://my-bucket/ragas-evaluation-results.
- KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: The name of the secret containing S3 credentials.

Apply the ConfigMap:

oc apply -f kubeflow-ragas-config.yaml

$ oc apply -f kubeflow-ragas-config.yaml

Copy to Clipboard

Toggle word wrap

Create a Llama Stack distribution configuration file with the Ragas remote provider. For example, create a llama-stack-ragas-remote.yaml as follows:

Example llama-stack-ragas-remote.yaml

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llama-stack-pod
spec:
  replicas: 1
  server:
    containerSpec:
      resources:
        requests:
          cpu: 4
          memory: "12Gi"
        limits:
          cpu: 6
          memory: "14Gi"
      env:
        - name: INFERENCE_MODEL
          valueFrom:
            secretKeyRef:
              key: INFERENCE_MODEL
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_MAX_TOKENS
          value: "4096"
        - name: VLLM_URL
          valueFrom:
            secretKeyRef:
              key: VLLM_URL
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_TLS_VERIFY
          valueFrom:
            secretKeyRef:
              key: VLLM_TLS_VERIFY
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_API_TOKEN
          valueFrom:
            secretKeyRef:
              key: VLLM_API_TOKEN
              name: llama-stack-inference-model-secret
              optional: true
        - name: MILVUS_DB_PATH
          value: ~/milvus.db
        - name: FMS_ORCHESTRATOR_URL
          value: "http://localhost"
        - name: KUBEFLOW_PIPELINES_ENDPOINT
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_PIPELINES_ENDPOINT
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_NAMESPACE
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_NAMESPACE
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_BASE_IMAGE
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_BASE_IMAGE
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_LLAMA_STACK_URL
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_LLAMA_STACK_URL
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_RESULTS_S3_PREFIX
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_RESULTS_S3_PREFIX
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
              name: kubeflow-ragas-config
              optional: true
        - name: EMBEDDING_MODEL
          valueFrom:
            configMapKeyRef:
              key: EMBEDDING_MODEL
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_PIPELINES_TOKEN
          valueFrom:
            secretKeyRef:
              key: KUBEFLOW_PIPELINES_TOKEN
              name: kubeflow-pipelines-token
              optional: true
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: "<ragas_s3_credentials>"
              optional: true
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: "<ragas_s3_credentials>"
              optional: true
        - name: AWS_DEFAULT_REGION
          valueFrom:
            secretKeyRef:
              key: AWS_DEFAULT_REGION
              name: "<ragas_s3_credentials>"
              optional: true
      name: llama-stack
      port: 8321
    distribution:
      name: rh-dev

apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
  name: llama-stack-pod
spec:
  replicas: 1
  server:
    containerSpec:
      resources:
        requests:
          cpu: 4
          memory: "12Gi"
        limits:
          cpu: 6
          memory: "14Gi"
      env:
        - name: INFERENCE_MODEL
          valueFrom:
            secretKeyRef:
              key: INFERENCE_MODEL
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_MAX_TOKENS
          value: "4096"
        - name: VLLM_URL
          valueFrom:
            secretKeyRef:
              key: VLLM_URL
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_TLS_VERIFY
          valueFrom:
            secretKeyRef:
              key: VLLM_TLS_VERIFY
              name: llama-stack-inference-model-secret
              optional: true
        - name: VLLM_API_TOKEN
          valueFrom:
            secretKeyRef:
              key: VLLM_API_TOKEN
              name: llama-stack-inference-model-secret
              optional: true
        - name: MILVUS_DB_PATH
          value: ~/milvus.db
        - name: FMS_ORCHESTRATOR_URL
          value: "http://localhost"
        - name: KUBEFLOW_PIPELINES_ENDPOINT
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_PIPELINES_ENDPOINT
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_NAMESPACE
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_NAMESPACE
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_BASE_IMAGE
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_BASE_IMAGE
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_LLAMA_STACK_URL
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_LLAMA_STACK_URL
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_RESULTS_S3_PREFIX
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_RESULTS_S3_PREFIX
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
          valueFrom:
            configMapKeyRef:
              key: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
              name: kubeflow-ragas-config
              optional: true
        - name: EMBEDDING_MODEL
          valueFrom:
            configMapKeyRef:
              key: EMBEDDING_MODEL
              name: kubeflow-ragas-config
              optional: true
        - name: KUBEFLOW_PIPELINES_TOKEN
          valueFrom:
            secretKeyRef:
              key: KUBEFLOW_PIPELINES_TOKEN
              name: kubeflow-pipelines-token
              optional: true
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: "<ragas_s3_credentials>"
              optional: true
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: "<ragas_s3_credentials>"
              optional: true
        - name: AWS_DEFAULT_REGION
          valueFrom:
            secretKeyRef:
              key: AWS_DEFAULT_REGION
              name: "<ragas_s3_credentials>"
              optional: true
      name: llama-stack
      port: 8321
    distribution:
      name: rh-dev

Copy to Clipboard

Toggle word wrap

Deploy the Llama Stack distribution:
```
oc apply -f llama-stack-ragas-remote.yaml
```
```
$ oc apply -f llama-stack-ragas-remote.yaml
```
Copy to Clipboard Toggle word wrap
Wait for the deployment to complete:
```
oc get pods -w
```
```
$ oc get pods -w
```
Copy to Clipboard Toggle word wrap
Wait until the llama-stack-pod pod status shows Running.

3.4. Evaluating RAG system quality with Ragas metrics
Link kopieren

Next steps

Evaluate your RAG system quality by testing your setup, using the example provided in the demo notebook. This demo outlines the basic steps for evaluating your RAG system with Ragas using the Python client. You can execute the demo notebook steps from a Jupyter environment.

Alternatively, you can submit an evaluation by directly using the http methods of the Llama Stack API.

Important

The Llama Stack pod must be accessible from the Jupyter environment in the cluster, which may not be the case by default. To configure this setup, see Ingesting content into a Llama model

Prerequisites

You have logged in to Red Hat OpenShift AI.
You have created a project.
You have created a pipeline server.
You have created a secret for your AWS credentials in your project namespace.
You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (Inline or Remote). For more information, see Setting up the Ragas inline provider for development.
You have access to a workbench or notebook environment where you can run Python code.

Procedure

From the OpenShift AI dashboard, click Projects.
Click the name of the project that contains the workbench.
Click the Workbenches tab.
If the status of the workbench is Running, skip to the next step.
If the status of the workbench is Stopped, in the Status column for the workbench, click Start.
The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started.
Click the open icon ( ) next to the workbench.
Your Jupyter environment window opens.
On the toolbar, click the Git Clone icon and then select Clone a Repository.
In the Clone a repo dialog, enter the following URL https://github.com/trustyai-explainability/llama-stack-provider-ragas.git
In the file browser, select the newly-created /llama-stack-provider-ragas/demos folder.
You see a Jupyter notebook named basic_demo.ipynb.
Double-click the basic_demo.ipynb file to launch the Jupyter notebook.
The Jupyter notebook opens. You see code examples for the following tasks:
- Run your Llama Stack distribution
- Setup and Imports
- Llama Stack Client Setup
- Dataset Preparation
- Dataset Registration
- Benchmark Registration
- Evaluation Execution
- Inline vs Remote Side-by-side
In the Jupyter notebook, run the code cells sequentially through the Evaluation Execution.
Return to the OpenShift AI dashboard.
Click Develop & train Pipelines Runs. You might need to refresh the page to see that the new evaluation job running.
Wait for the job to show Successful.
Return to the workbench and run the Results Display cell.
Inspect the results displayed.

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Evaluating RAG systems with Ragas

3.1. About Ragas evaluation
Link kopieren

3.1.1. Key Ragas metrics
Link kopieren

3.1.2. Use cases for Ragas in AI engineering workflows
Link kopieren

3.1.3. Ragas provider deployment modes
Link kopieren

3.2. Setting up the Ragas inline provider for development
Link kopieren

3.3. Configuring the Ragas remote provider for production
Link kopieren

3.4. Evaluating RAG system quality with Ragas metrics
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Evaluating RAG systems with Ragas

3.1. About Ragas evaluationLink kopierenLink in die Zwischenablage kopiert!

3.1.1. Key Ragas metricsLink kopierenLink in die Zwischenablage kopiert!

3.1.2. Use cases for Ragas in AI engineering workflowsLink kopierenLink in die Zwischenablage kopiert!

3.1.3. Ragas provider deployment modesLink kopierenLink in die Zwischenablage kopiert!

3.2. Setting up the Ragas inline provider for developmentLink kopierenLink in die Zwischenablage kopiert!

3.3. Configuring the Ragas remote provider for productionLink kopierenLink in die Zwischenablage kopiert!

3.4. Evaluating RAG system quality with Ragas metricsLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. About Ragas evaluation
Link kopieren

3.1.1. Key Ragas metrics
Link kopieren

3.1.2. Use cases for Ragas in AI engineering workflows
Link kopieren

3.1.3. Ragas provider deployment modes
Link kopieren

3.2. Setting up the Ragas inline provider for development
Link kopieren

3.3. Configuring the Ragas remote provider for production
Link kopieren

3.4. Evaluating RAG system quality with Ragas metrics
Link kopieren