Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 3. Evaluating RAG systems with Ragas
Retrieval-Augmented Generation Assessment (Ragas) is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
As an AI engineer, you can use Retrieval-Augmented Generation Assessment (Ragas) to measure and improve the quality of your RAG systems in OpenShift AI. Ragas provides objective metrics that assess retrieval quality, answer relevance, and factual consistency, enabling you to identify issues, optimize configurations, and establish automated quality gates in your development workflows.
Ragas is integrated with OpenShift AI through the Llama Stack evaluation API and supports two deployment modes: an inline provider for development and testing, and a remote provider for production-scale evaluations using OpenShift AI pipelines.
3.1. About Ragas evaluation Link kopierenLink in die Zwischenablage kopiert!
Ragas addresses the unique challenges of evaluating RAG systems by providing metrics that assess both the retrieval and generation components of your application. Unlike traditional language model evaluation that focuses solely on output quality, Ragas evaluates how well your system retrieves relevant context and generates responses grounded in that context.
3.1.1. Key Ragas metrics Link kopierenLink in die Zwischenablage kopiert!
Ragas provides multiple metrics for evaluating RAG systems. Here are some of the metrics:
- Faithfulness
- Measures the generated answer to determine whether it is consistent with the retrieved context. A high faithfulness score indicates that the answer is well-grounded in the source documents, reducing the risk of hallucinations. This is critical for enterprise and regulated environments where accuracy and trustworthiness are paramount.
- Answer Relevancy
- Evaluates whether the generated answer is consistent with the input question. This metric ensures that your RAG system provides pertinent responses rather than generic or off-topic information.
- Context Precision
- Measures the precision of the retrieval component by evaluating whether the retrieved context chunks contain information relevant to answering the question. High precision indicates that your retrieval system is returning focused, relevant documents rather than irrelevant noise.
- Context Recall
- Measures the recall of the retrieval component by evaluating whether all necessary information for answering the question is present in the retrieved contexts. High recall ensures that your retrieval system is not missing important information.
- Answer Correctness
- Compares the generated answer with a ground truth reference answer to measure accuracy. This metric is useful when you have labeled evaluation datasets with known correct answers.
- Answer Similarity
- Measures the semantic similarity between the generated answer and a reference answer, providing a more nuanced assessment than exact string matching.
3.1.2. Use cases for Ragas in AI engineering workflows Link kopierenLink in die Zwischenablage kopiert!
Ragas enables AI engineers to accomplish the following tasks:
- Automate quality checks
- Create reproducible, objective evaluation jobs that can be automatically triggered after every code commit or model update. Automatic quality checks establish quality gates to prevent regressions and ensure that you deploy only high-quality RAG configurations to production.
- Enable evaluation-driven development (EDD)
- Use Ragas metrics to guide iterative optimization. For example, test different chunking strategies, embedding models, or retrieval algorithms against a defined benchmark. You can discover the optimal RAG configuration that maximizes performance metrics. For example, you can maximize faithfulness while minimizing computational cost.
- Ensure factual consistency and trustworthiness
- Measure the reliability of your RAG system by setting thresholds on metrics like faithfulness. Metrics thresholds ensure that responses are consistently grounded in source documents, which is critical for enterprise applications where hallucinations or factual errors are unacceptable.
- Achieve production scalability
- Leverage the remote provider pattern with OpenShift AI pipelines to execute evaluations as distributed jobs. The remote provider pattern allows you to run large-scale benchmarks across thousands of data points without blocking development or consuming excessive local resources.
- Compare model and configuration variants
- Run comparative evaluations across different models, retrieval strategies, or system configurations to make data-driven decisions about your RAG architecture. For example, compare the impact of different chunk sizes (512 vs 1024 tokens) or different embedding models on retrieval quality metrics.
3.1.3. Ragas provider deployment modes Link kopierenLink in die Zwischenablage kopiert!
OpenShift AI supports two deployment modes for Ragas evaluation:
- Inline provider
The inline provider mode runs Ragas evaluation in the same process as the Llama Stack server. Use the inline provider for development and rapid prototyping. It offers the following advantages:
- Fast processing with in-memory operations
- Minimal configuration overhead
- Local development and testing
- Evaluation of small to medium-sized datasets
- Remote provider
The remote provider mode runs Ragas evaluation as distributed jobs using OpenShift AI pipelines (powered by Kubeflow Pipelines). Use the remote provider for production environments. It offers the following capabilities:
- Running evaluations in parallel across thousands of data points
- Providing resource isolation and management
- Integrating with CI/CD pipelines for automated quality gates
- Storing results in S3-compatible object storage
- Tracking evaluation history and metrics over time
- Supporting large-scale batch evaluations without impacting the Llama Stack server
3.2. Setting up the Ragas inline provider for development Link kopierenLink in die Zwischenablage kopiert!
You can set up the Ragas inline provider to run evaluations directly within the Llama Stack server process. The inline provider is ideal for development environments, rapid prototyping, and lightweight evaluation workloads where simplicity and quick iteration are priorities.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have activated the Llama Stack Operator in OpenShift AI.
- You have deployed a Llama model with KServe. For more information, see Deploying a Llama model with KServe.
- You have created a project.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (
oc) as shown in the following example:oc login <openshift_cluster_url> -u <username> -p <password>
$ oc login <openshift_cluster_url> -u <username> -p <password>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to your project:
oc project <project_name>
$ oc project <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a ConfigMap for the Ragas inline provider configuration. For example, create a
ragas-inline-config.yamlfile as follows:Example
ragas-inline-config.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations. Theall-MiniLM-L6-v2model is a lightweight, efficient option suitable for most use cases.
-
Apply the ConfigMap:
oc apply -f ragas-inline-config.yaml
$ oc apply -f ragas-inline-config.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Llama Stack distribution configuration file with the Ragas inline provider. For example, create a
llama-stack-ragas-inline.yamlfile as follows:Example
llama-stack-ragas-inline.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Llama Stack distribution:
oc apply -f llama-stack-ragas-inline.yaml
$ oc apply -f llama-stack-ragas-inline.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait for the deployment to complete:
oc get pods -w
$ oc get pods -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until the
llama-stack-ragas-inlinepod status showsRunning.
3.3. Configuring the Ragas remote provider for production Link kopierenLink in die Zwischenablage kopiert!
Next steps
You can configure the Ragas remote provider to run evaluations as distributed jobs using OpenShift AI pipelines. The remote provider enables production-scale evaluations by running Ragas in a separate Kubeflow Pipelines environment, providing resource isolation, improved scalability, and integration with CI/CD workflows.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have installed the OpenShift AI Operator.
-
You have a
DataScienceClustercustom resource in your environment; in thespec.componentssection thellamastackoperator.managementStateis enabled with a value ofManaged. You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have configured a pipeline server in your project. For more information, see Configuring a pipeline server.
- You have activated the Llama Stack Operator in OpenShift AI.
- You have deployed a Large Language Model with KServe. For more information, see Deploying a Llama model with KServe.
- You have configured S3-compatible object storage for storing evaluation results and you know your S3 credentials: AWS access key, AWS secret access key, and AWS default region. For more information, see Adding a connection to your project.
- You have created a project.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster, log in to the OpenShift CLI (
oc) as shown in the following example:oc login <openshift_cluster_url> -u <username> -p <password>
$ oc login <openshift_cluster_url> -u <username> -p <password>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to your project:
oc project <project_name>
$ oc project <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a secret for storing S3 credentials:
oc create secret generic "<ragas_s3_credentials>" \ --from-literal=AWS_ACCESS_KEY_ID=<your_access_key> \ --from-literal=AWS_SECRET_ACCESS_KEY=<your_secret_key> \ --from-literal=AWS_DEFAULT_REGION=<your_region>
$ oc create secret generic "<ragas_s3_credentials>" \ --from-literal=AWS_ACCESS_KEY_ID=<your_access_key> \ --from-literal=AWS_SECRET_ACCESS_KEY=<your_secret_key> \ --from-literal=AWS_DEFAULT_REGION=<your_region>Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantReplace the placeholder values with your actual S3 credentials. These AWS credentials are required in two locations:
- In the Llama Stack server pod (as environment variables) - to access S3 when creating pipeline runs.
- In the Kubeflow Pipeline pods (via the secret) - to store evaluation results to S3 during pipeline execution.
The LlamaStackDistribution configuration loads these credentials from the
"<ragas_s3_credentials>"secret and makes them available to both locations.Create a secret for the Kubeflow Pipelines API token:
Get your token by running the following command:
export KUBEFLOW_PIPELINES_TOKEN=$(oc whoami -t)
$ export KUBEFLOW_PIPELINES_TOKEN=$(oc whoami -t)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the secret by running the following command:
oc create secret generic kubeflow-pipelines-token \ --from-literal=KUBEFLOW_PIPELINES_TOKEN="$KUBEFLOW_PIPELINES_TOKEN"
$ oc create secret generic kubeflow-pipelines-token \ --from-literal=KUBEFLOW_PIPELINES_TOKEN="$KUBEFLOW_PIPELINES_TOKEN"Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantThe Llama Stack distribution service account does not have privileges to create pipeline runs. This secret provides the necessary authentication token for creating and managing pipeline runs.
Verify that the Kubeflow Pipelines endpoint is accessible:
curl -k -H "Authorization: Bearer $KUBEFLOW_PIPELINES_TOKEN" \ https://$KUBEFLOW_PIPELINES_ENDPOINT/apis/v1beta1/healthz
$ curl -k -H "Authorization: Bearer $KUBEFLOW_PIPELINES_TOKEN" \ https://$KUBEFLOW_PIPELINES_ENDPOINT/apis/v1beta1/healthzCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a secret for storing your inference model information:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the Kubeflow Pipelines endpoint by running the following command and searching for "pipeline" in the routes. This is used in a later step for creating a ConfigMap for the Ragas remote provider configuration:
oc get routes -A | grep -i pipeline
$ oc get routes -A | grep -i pipelineCopy to Clipboard Copied! Toggle word wrap Toggle overflow This output should show that the namespace, which is the namespace you specified for
KUBEFLOW_NAMESPACE, has the pipeline server endpoint and the associated metadata one. The one to use isds-pipeline-dspa.Create a ConfigMap for the Ragas remote provider configuration. For example, create a
kubeflow-ragas-config.yamlfile as follows:Example kubeflow-ragas-config.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
EMBEDDING_MODEL: Used by Ragas for semantic similarity calculations. -
KUBEFLOW_LLAMA_STACK_URL: The URL for the Llama Stack server. This must be accessible from the Kubeflow Pipeline pods. The <distribution_name>, <namespace>, and <port> are replaced with the name of the LlamaStack distribution you are creating, the namespace where you are creating it, and the port. These 3 elements are present in the LlamaStack distribution YAML. -
KUBEFLOW_PIPELINES_ENDPOINT: The Kubeflow Pipelines API endpoint URL. -
KUBEFLOW_NAMESPACE: The namespace where pipeline runs are executed. This should match your current project namespace. -
KUBEFLOW_BASE_IMAGE: The container image used for Ragas evaluation pipeline components. This image contains the Ragas provider package installed via pip. -
KUBEFLOW_RESULTS_S3_PREFIX: The S3 path prefix where evaluation results are stored. For example:s3://my-bucket/ragas-evaluation-results. -
KUBEFLOW_S3_CREDENTIALS_SECRET_NAME: The name of the secret containing S3 credentials.
-
Apply the ConfigMap:
oc apply -f kubeflow-ragas-config.yaml
$ oc apply -f kubeflow-ragas-config.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Llama Stack distribution configuration file with the Ragas remote provider. For example, create a llama-stack-ragas-remote.yaml as follows:
Example llama-stack-ragas-remote.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Llama Stack distribution:
oc apply -f llama-stack-ragas-remote.yaml
$ oc apply -f llama-stack-ragas-remote.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait for the deployment to complete:
oc get pods -w
$ oc get pods -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until the
llama-stack-podpod status showsRunning.
3.4. Evaluating RAG system quality with Ragas metrics Link kopierenLink in die Zwischenablage kopiert!
Next steps
Evaluate your RAG system quality by testing your setup, using the example provided in the demo notebook. This demo outlines the basic steps for evaluating your RAG system with Ragas using the Python client. You can execute the demo notebook steps from a Jupyter environment.
Alternatively, you can submit an evaluation by directly using the http methods of the Llama Stack API.
The Llama Stack pod must be accessible from the Jupyter environment in the cluster, which may not be the case by default. To configure this setup, see Ingesting content into a Llama model
Prerequisites
- You have logged in to Red Hat OpenShift AI.
- You have created a project.
- You have created a pipeline server.
- You have created a secret for your AWS credentials in your project namespace.
- You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (Inline or Remote). For more information, see Setting up the Ragas inline provider for development.
- You have access to a workbench or notebook environment where you can run Python code.
Procedure
- From the OpenShift AI dashboard, click Projects.
- Click the name of the project that contains the workbench.
- Click the Workbenches tab.
If the status of the workbench is Running, skip to the next step.
If the status of the workbench is Stopped, in the Status column for the workbench, click Start.
The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started.
Click the open icon (
) next to the workbench.
Your Jupyter environment window opens.
- On the toolbar, click the Git Clone icon and then select Clone a Repository.
-
In the Clone a repo dialog, enter the following URL
https://github.com/trustyai-explainability/llama-stack-provider-ragas.git In the file browser, select the newly-created
/llama-stack-provider-ragas/demosfolder.You see a Jupyter notebook named
basic_demo.ipynb.Double-click the
basic_demo.ipynbfile to launch the Jupyter notebook.The Jupyter notebook opens. You see code examples for the following tasks:
- Run your Llama Stack distribution
- Setup and Imports
- Llama Stack Client Setup
- Dataset Preparation
- Dataset Registration
- Benchmark Registration
- Evaluation Execution
- Inline vs Remote Side-by-side
- In the Jupyter notebook, run the code cells sequentially through the Evaluation Execution.
- Return to the OpenShift AI dashboard.
-
Click Develop & train
Pipelines Runs. You might need to refresh the page to see that the new evaluation job running. - Wait for the job to show Successful.
- Return to the workbench and run the Results Display cell.
- Inspect the results displayed.