Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Evaluating RAG systems with Ragas


Important

Retrieval-Augmented Generation Assessment (Ragas) is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

As an AI engineer, you can use Retrieval-Augmented Generation Assessment (Ragas) to measure and improve the quality of your RAG systems in OpenShift AI. Ragas provides objective metrics that assess retrieval quality, answer relevance, and factual consistency, enabling you to identify issues, optimize configurations, and establish automated quality gates in your development workflows.

Ragas is integrated with OpenShift AI through the Llama Stack evaluation API and supports two deployment modes: an inline provider for development and testing, and a remote provider for production-scale evaluations using OpenShift AI pipelines.

3.1. About Ragas evaluation

Ragas addresses the unique challenges of evaluating RAG systems by providing metrics that assess both the retrieval and generation components of your application. Unlike traditional language model evaluation that focuses solely on output quality, Ragas evaluates how well your system retrieves relevant context and generates responses grounded in that context.

3.1.1. Key Ragas metrics

Ragas provides multiple metrics for evaluating RAG systems. Here are some of the metrics:

Faithfulness
Measures the generated answer to determine whether it is consistent with the retrieved context. A high faithfulness score indicates that the answer is well-grounded in the source documents, reducing the risk of hallucinations. This is critical for enterprise and regulated environments where accuracy and trustworthiness are paramount.
Answer Relevancy
Evaluates whether the generated answer is consistent with the input question. This metric ensures that your RAG system provides pertinent responses rather than generic or off-topic information.
Context Precision
Measures the precision of the retrieval component by evaluating whether the retrieved context chunks contain information relevant to answering the question. High precision indicates that your retrieval system is returning focused, relevant documents rather than irrelevant noise.
Context Recall
Measures the recall of the retrieval component by evaluating whether all necessary information for answering the question is present in the retrieved contexts. High recall ensures that your retrieval system is not missing important information.
Answer Correctness
Compares the generated answer with a ground truth reference answer to measure accuracy. This metric is useful when you have labeled evaluation datasets with known correct answers.
Answer Similarity
Measures the semantic similarity between the generated answer and a reference answer, providing a more nuanced assessment than exact string matching.

3.1.2. Use cases for Ragas in AI engineering workflows

Ragas enables AI engineers to accomplish the following tasks:

Automate quality checks
Create reproducible, objective evaluation jobs that can be automatically triggered after every code commit or model update. Automatic quality checks establish quality gates to prevent regressions and ensure that you deploy only high-quality RAG configurations to production.
Enable evaluation-driven development (EDD)
Use Ragas metrics to guide iterative optimization. For example, test different chunking strategies, embedding models, or retrieval algorithms against a defined benchmark. You can discover the optimal RAG configuration that maximizes performance metrics. For example, you can maximize faithfulness while minimizing computational cost.
Ensure factual consistency and trustworthiness
Measure the reliability of your RAG system by setting thresholds on metrics like faithfulness. Metrics thresholds ensure that responses are consistently grounded in source documents, which is critical for enterprise applications where hallucinations or factual errors are unacceptable.
Achieve production scalability
Leverage the remote provider pattern with OpenShift AI pipelines to execute evaluations as distributed jobs. The remote provider pattern allows you to run large-scale benchmarks across thousands of data points without blocking development or consuming excessive local resources.
Compare model and configuration variants
Run comparative evaluations across different models, retrieval strategies, or system configurations to make data-driven decisions about your RAG architecture. For example, compare the impact of different chunk sizes (512 vs 1024 tokens) or different embedding models on retrieval quality metrics.

3.1.3. Ragas provider deployment modes

OpenShift AI supports two deployment modes for Ragas evaluation:

Inline provider

The inline provider mode runs Ragas evaluation in the same process as the Llama Stack server. Use the inline provider for development and rapid prototyping. It offers the following advantages:

  • Fast processing with in-memory operations
  • Minimal configuration overhead
  • Local development and testing
  • Evaluation of small to medium-sized datasets
Remote provider

The remote provider mode runs Ragas evaluation as distributed jobs using OpenShift AI pipelines (powered by Kubeflow Pipelines). Use the remote provider for production environments. It offers the following capabilities:

  • Running evaluations in parallel across thousands of data points
  • Providing resource isolation and management
  • Integrating with CI/CD pipelines for automated quality gates
  • Storing results in S3-compatible object storage
  • Tracking evaluation history and metrics over time
  • Supporting large-scale batch evaluations without impacting the Llama Stack server

3.2. Setting up the Ragas inline provider for development

You can set up the Ragas inline provider to run evaluations directly within the Llama Stack server process. The inline provider is ideal for development environments, rapid prototyping, and lightweight evaluation workloads where simplicity and quick iteration are priorities.

Prerequisites

  • You have cluster administrator privileges for your OpenShift cluster.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

  • You have activated the Llama Stack Operator in OpenShift AI.
  • You have deployed a Llama model with KServe. For more information, see Deploying a Llama model with KServe.
  • You have created a project.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (oc) as shown in the following example:

    $ oc login <openshift_cluster_url> -u <username> -p <password>
    Copy to Clipboard Toggle word wrap
  2. Navigate to your project:

    $ oc project <project_name>
    Copy to Clipboard Toggle word wrap
  3. Create a ConfigMap for the Ragas inline provider configuration. For example, create a ragas-inline-config.yaml file as follows:

    Example ragas-inline-config.yaml

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ragas-inline-config
      namespace: <project_name>
    data:
      EMBEDDING_MODEL: "all-MiniLM-L6-v2"
    Copy to Clipboard Toggle word wrap

    • EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations. The all-MiniLM-L6-v2 model is a lightweight, efficient option suitable for most use cases.
  4. Apply the ConfigMap:

    $ oc apply -f ragas-inline-config.yaml
    Copy to Clipboard Toggle word wrap
  5. Create a Llama Stack distribution configuration file with the Ragas inline provider. For example, create a llama-stack-ragas-inline.yaml file as follows:

    Example llama-stack-ragas-inline.yaml

    apiVersion: llamastack.trustyai.opendatahub.io/v1alpha1
    kind: LlamaStackDistribution
    metadata:
      name: llama-stack-ragas-inline
      namespace: <project_name>
    spec:
      replicas: 1
      server:
        containerSpec:
          env:
    # ...
          - name: VLLM_URL
            value: <model_url>
          - name: VLLM_API_TOKEN
            value: <model_api_token (if necessary)>
          - name: INFERENCE_MODEL
            value: <model_name>
          - name: MILVUS_DB_PATH
            value: ~/.llama/milvus.db
          - name: VLLM_TLS_VERIFY
            value: "false"
          - name: FMS_ORCHESTRATOR_URL
            value: http://localhost:123
          - name: EMBEDDING_MODEL
            value: granite-embedding-125m
    # ...
    Copy to Clipboard Toggle word wrap

  6. Deploy the Llama Stack distribution:

    $ oc apply -f llama-stack-ragas-inline.yaml
    Copy to Clipboard Toggle word wrap
  7. Wait for the deployment to complete:

    $ oc get pods -w
    Copy to Clipboard Toggle word wrap

    Wait until the llama-stack-ragas-inline pod status shows Running.

3.3. Configuring the Ragas remote provider for production

Next steps

You can configure the Ragas remote provider to run evaluations as distributed jobs using OpenShift AI pipelines. The remote provider enables production-scale evaluations by running Ragas in a separate Kubeflow Pipelines environment, providing resource isolation, improved scalability, and integration with CI/CD workflows.

Prerequisites

  • You have cluster administrator privileges for your OpenShift cluster.
  • You have installed the OpenShift AI Operator.
  • You have a DataScienceCluster custom resource in your environment; in the spec.components section the llamastackoperator.managementState is enabled with a value of Managed.
  • You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:

  • You have configured a pipeline server in your project. For more information, see Configuring a pipeline server.
  • You have activated the Llama Stack Operator in OpenShift AI.
  • You have deployed a Large Language Model with KServe. For more information, see Deploying a Llama model with KServe.
  • You have configured S3-compatible object storage for storing evaluation results and you know your S3 credentials: AWS access key, AWS secret access key, and AWS default region. For more information, see Adding a connection to your project.
  • You have created a project.

Procedure

  1. In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (oc) as shown in the following example:

    $ oc login <openshift_cluster_url> -u <username> -p <password>
    Copy to Clipboard Toggle word wrap
  2. Navigate to your project:

    $ oc project <project_name>
    Copy to Clipboard Toggle word wrap
  3. Create a secret for storing S3 credentials:

    $ oc create secret generic "<ragas_s3_credentials>" \
      --from-literal=AWS_ACCESS_KEY_ID=<your_access_key> \
      --from-literal=AWS_SECRET_ACCESS_KEY=<your_secret_key> \
      --from-literal=AWS_DEFAULT_REGION=<your_region>
    Copy to Clipboard Toggle word wrap
    Important

    Replace the placeholder values with your actual S3 credentials. These AWS credentials are required in two locations:

    • In the Llama Stack server pod (as environment variables) - to access S3 when creating pipeline runs.
    • In the Kubeflow Pipeline pods (via the secret) - to store evaluation results to S3 during pipeline execution.

    The LlamaStackDistribution configuration loads these credentials from the "<ragas_s3_credentials>" secret and makes them available to both locations.

  4. Create a secret for the Kubeflow Pipelines API token:

    1. Get your token by running the following command:

      $ export KUBEFLOW_PIPELINES_TOKEN=$(oc whoami -t)
      Copy to Clipboard Toggle word wrap
    2. Create the secret by running the following command:

      $ oc create secret generic kubeflow-pipelines-token \
        --from-literal=KUBEFLOW_PIPELINES_TOKEN="$KUBEFLOW_PIPELINES_TOKEN"
      Copy to Clipboard Toggle word wrap
      Important

      The Llama Stack distribution service account does not have privileges to create pipeline runs. This secret provides the necessary authentication token for creating and managing pipeline runs.

  5. Verify that the Kubeflow Pipelines endpoint is accessible:

    $ curl -k -H "Authorization: Bearer $KUBEFLOW_PIPELINES_TOKEN" \
     https://$KUBEFLOW_PIPELINES_ENDPOINT/apis/v1beta1/healthz
    Copy to Clipboard Toggle word wrap
  6. Create a secret for storing your inference model information:

    $ export INFERENCE_MODEL="llama-3-2-3b"
    $ export VLLM_URL="https://llama-32-3b-instruct-predictor:8443/v1"
    $ export VLLM_TLS_VERIFY="false"  # Use "true" in production
    $ export VLLM_API_TOKEN="<token_identifier>"
    
    $ oc create secret generic llama-stack-inference-model-secret \
      --from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
      --from-literal VLLM_URL="$VLLM_URL" \
      --from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
      --from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
    Copy to Clipboard Toggle word wrap
  7. Get the Kubeflow Pipelines endpoint by running the following command and searching for "pipeline" in the routes. This is used in a later step for creating a ConfigMap for the Ragas remote provider configuration:

    $ oc get routes -A | grep -i pipeline
    Copy to Clipboard Toggle word wrap

    This output should show that the namespace, which is the namespace you specified for KUBEFLOW_NAMESPACE, has the pipeline server endpoint and the associated metadata one. The one to use is ds-pipeline-dspa.

  8. Create a ConfigMap for the Ragas remote provider configuration. For example, create a kubeflow-ragas-config.yaml file as follows:

    Example kubeflow-ragas-config.yaml

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kubeflow-ragas-config
      namespace: <project_name>
    data:
      EMBEDDING_MODEL: "all-MiniLM-L6-v2"
      KUBEFLOW_LLAMA_STACK_URL: "http://$<distribution_name>-service.$<your_namespace>.svc.cluster.local:$<port>"
      KUBEFLOW_PIPELINES_ENDPOINT: "https://<kfp_endpoint>"
      KUBEFLOW_NAMESPACE: "<project_name>"
      KUBEFLOW_BASE_IMAGE: "quay.io/rhoai/odh-trustyai-ragas-lls-provider-dsp-rhel9:rhoai-3.0"
      KUBEFLOW_RESULTS_S3_PREFIX: "s3://<bucket_name>/ragas-results"
      KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: "<ragas_s3_credentials>"
    Copy to Clipboard Toggle word wrap

    • EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations.
    • KUBEFLOW_LLAMA_STACK_URL: The URL for the Llama Stack server. This must be accessible from the Kubeflow Pipeline pods. The <distribution_name>, <namespace>, and <port> are replaced with the name of the LlamaStack distribution you are creating, the namespace where you are creating it, and the port. These 3 elements are present in the LlamaStack distribution YAML.
    • KUBEFLOW_PIPELINES_ENDPOINT: The Kubeflow Pipelines API endpoint URL.
    • KUBEFLOW_NAMESPACE: The namespace where pipeline runs are executed. This should match your current project namespace.
    • KUBEFLOW_BASE_IMAGE: The container image used for Ragas evaluation pipeline components. This image contains the Ragas provider package installed via pip.
    • KUBEFLOW_RESULTS_S3_PREFIX: The S3 path prefix where evaluation results are stored. For example: s3://my-bucket/ragas-evaluation-results.
    • KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: The name of the secret containing S3 credentials.
  9. Apply the ConfigMap:

    $ oc apply -f kubeflow-ragas-config.yaml
    Copy to Clipboard Toggle word wrap
  10. Create a Llama Stack distribution configuration file with the Ragas remote provider. For example, create a llama-stack-ragas-remote.yaml as follows:

    Example llama-stack-ragas-remote.yaml

    apiVersion: llamastack.io/v1alpha1
    kind: LlamaStackDistribution
    metadata:
      name: llama-stack-pod
    spec:
      replicas: 1
      server:
        containerSpec:
          resources:
            requests:
              cpu: 4
              memory: "12Gi"
            limits:
              cpu: 6
              memory: "14Gi"
          env:
            - name: INFERENCE_MODEL
              valueFrom:
                secretKeyRef:
                  key: INFERENCE_MODEL
                  name: llama-stack-inference-model-secret
                  optional: true
            - name: VLLM_MAX_TOKENS
              value: "4096"
            - name: VLLM_URL
              valueFrom:
                secretKeyRef:
                  key: VLLM_URL
                  name: llama-stack-inference-model-secret
                  optional: true
            - name: VLLM_TLS_VERIFY
              valueFrom:
                secretKeyRef:
                  key: VLLM_TLS_VERIFY
                  name: llama-stack-inference-model-secret
                  optional: true
            - name: VLLM_API_TOKEN
              valueFrom:
                secretKeyRef:
                  key: VLLM_API_TOKEN
                  name: llama-stack-inference-model-secret
                  optional: true
            - name: MILVUS_DB_PATH
              value: ~/milvus.db
            - name: FMS_ORCHESTRATOR_URL
              value: "http://localhost"
            - name: KUBEFLOW_PIPELINES_ENDPOINT
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_PIPELINES_ENDPOINT
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_NAMESPACE
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_NAMESPACE
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_BASE_IMAGE
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_BASE_IMAGE
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_LLAMA_STACK_URL
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_LLAMA_STACK_URL
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_RESULTS_S3_PREFIX
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_RESULTS_S3_PREFIX
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
              valueFrom:
                configMapKeyRef:
                  key: KUBEFLOW_S3_CREDENTIALS_SECRET_NAME
                  name: kubeflow-ragas-config
                  optional: true
            - name: EMBEDDING_MODEL
              valueFrom:
                configMapKeyRef:
                  key: EMBEDDING_MODEL
                  name: kubeflow-ragas-config
                  optional: true
            - name: KUBEFLOW_PIPELINES_TOKEN
              valueFrom:
                secretKeyRef:
                  key: KUBEFLOW_PIPELINES_TOKEN
                  name: kubeflow-pipelines-token
                  optional: true
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: AWS_ACCESS_KEY_ID
                  name: "<ragas_s3_credentials>"
                  optional: true
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: AWS_SECRET_ACCESS_KEY
                  name: "<ragas_s3_credentials>"
                  optional: true
            - name: AWS_DEFAULT_REGION
              valueFrom:
                secretKeyRef:
                  key: AWS_DEFAULT_REGION
                  name: "<ragas_s3_credentials>"
                  optional: true
          name: llama-stack
          port: 8321
        distribution:
          name: rh-dev
    Copy to Clipboard Toggle word wrap

  11. Deploy the Llama Stack distribution:

    $ oc apply -f llama-stack-ragas-remote.yaml
    Copy to Clipboard Toggle word wrap
  12. Wait for the deployment to complete:

    $ oc get pods -w
    Copy to Clipboard Toggle word wrap

    Wait until the llama-stack-pod pod status shows Running.

3.4. Evaluating RAG system quality with Ragas metrics

Next steps

Evaluate your RAG system quality by testing your setup, using the example provided in the demo notebook. This demo outlines the basic steps for evaluating your RAG system with Ragas using the Python client. You can execute the demo notebook steps from a Jupyter environment.

Alternatively, you can submit an evaluation by directly using the http methods of the Llama Stack API.

Important

The Llama Stack pod must be accessible from the Jupyter environment in the cluster, which may not be the case by default. To configure this setup, see Ingesting content into a Llama model

Prerequisites

  • You have logged in to Red Hat OpenShift AI.
  • You have created a project.
  • You have created a pipeline server.
  • You have created a secret for your AWS credentials in your project namespace.
  • You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (Inline or Remote). For more information, see Setting up the Ragas inline provider for development.
  • You have access to a workbench or notebook environment where you can run Python code.

Procedure

  1. From the OpenShift AI dashboard, click Projects.
  2. Click the name of the project that contains the workbench.
  3. Click the Workbenches tab.
  4. If the status of the workbench is Running, skip to the next step.

    If the status of the workbench is Stopped, in the Status column for the workbench, click Start.

    The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started.

  5. Click the open icon ( The open icon ) next to the workbench.

    Your Jupyter environment window opens.

  6. On the toolbar, click the Git Clone icon and then select Clone a Repository.
  7. In the Clone a repo dialog, enter the following URL https://github.com/trustyai-explainability/llama-stack-provider-ragas.git
  8. In the file browser, select the newly-created /llama-stack-provider-ragas/demos folder.

    You see a Jupyter notebook named basic_demo.ipynb.

  9. Double-click the basic_demo.ipynb file to launch the Jupyter notebook.

    The Jupyter notebook opens. You see code examples for the following tasks:

    • Run your Llama Stack distribution
    • Setup and Imports
    • Llama Stack Client Setup
    • Dataset Preparation
    • Dataset Registration
    • Benchmark Registration
    • Evaluation Execution
    • Inline vs Remote Side-by-side
  10. In the Jupyter notebook, run the code cells sequentially through the Evaluation Execution.
  11. Return to the OpenShift AI dashboard.
  12. Click Develop & train Pipelines Runs. You might need to refresh the page to see that the new evaluation job running.
  13. Wait for the job to show Successful.
  14. Return to the workbench and run the Results Display cell.
  15. Inspect the results displayed.
Nach oben
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2025 Red Hat