Chapter 5. Testing your model with retrieval augmented generation (RAG)


You can enhance your model’s responses by providing it with contextual information from your own documents using retrieval augmented generation (RAG). You can upload documents to the vector database associated with the playground to provide context for your model’s responses.

Important

The RAG feature of the gen AI playground is currently configured to work only with an inline vector database. There is currently no mechanism to configure the playground to connect this RAG feature to an external or remote vector database.

Prerequisites

  • You have configured a playground for your project.
  • You have the document files ready to upload. The supported file formats are PDF, DOC, or CSV. You can upload up to 10 files, with a maximum size of 10MB per file.

Procedure

  1. From the OpenShift AI dashboard, click Gen AI studio Playground.
  2. In the Playground interface, click the toggle in the RAG section and then expand the section.
  3. Click Upload.

    The Upload files dialog opens.

  4. Drag and drop your file or click to browse and select a file from your local system.
  5. Optional: Adjust the Maximum chunk length and Chunk overlap and Delimiter values as needed for your document type. For more information about these settings, see Understanding RAG settings.
  6. Click Upload.

    Wait for the file to finish processing. A Source uploaded notification appears, and the file is listed under Uploaded files.

  7. Repeat these steps to upload additional files if needed.
  8. In the System instructions field, review or edit the text to define the context, persona, or instructions for the model. The playground provides a default prompt.
  9. In the chat input field, ask a question related to your documents that the model would not know otherwise.
  10. Observe the model’s response.

    Tip

    If a model is reluctant to use the RAG feature (its knowledge search tool), you can modify the prompt in the System instructions field to explicitly guide its behavior.

    To ensure that the model actively utilizes the available RAG documents rather than relying solely on its pre-trained data, refine the system prompt by including directives as shown in the following examples:

    • To force use: "You MUST use the knowledge_search tool to obtain updated information."
    • To specify context: "Always search the knowledge base before answering questions about company policies, recent events, or specific documentation."

    NOTE: After you send a prompt, the Send button in the chat input field changes to a Stop button. Click it if you want to interrupt the model’s response, for example, when the response takes longer than you anticipated or if you notice that you made an error in your prompt. The Bot posts “You stopped this message” to confirm your stop request.

  11. Optional. To clear the chat history and start a new conversation, click New Chat. The chat interface clears the chat history. Your playground configuration settings are preserved.

Verification

  • The model retrieves information from the uploaded documents to answer the questions.

5.1. Understanding RAG settings

When you upload a document for retrieval augmented generation (RAG), you can configure the following settings to optimize how the text is processed.

Maximum chunk length

The maximum word count for each text section ("chunk") created from your uploaded files.

  • Smaller chunks are recommended for precise data retrieval.
  • Larger chunks are recommended for tasks requiring broader context, such as summarization.

Chunk overlap

The number of words from the end of one text section (chunk) that are repeated at the start of the next one. This overlap helps maintain continuous context across chunks, improving model responses.

For example, the following sentence is chunked differently depending on the chunk overlap: "Chunk overlap can improve the quality of model responses."

Maximum chunk length = 4, Chunk overlap = 1

Chunk overlap can improve
improve the quality of
of model responses.
Copy to Clipboard Toggle word wrap

Maximum chunk length = 4, Chunk overlap = 0

Chunk overlap can improve
the quality of model
responses.
Copy to Clipboard Toggle word wrap

Delimiter

A character or string that specifies where a text chunk should end. This helps define text boundaries alongside maximum chunk length and overlap, ensuring sentences or paragraphs remain intact.

Examples of delimiters:

  • . (period) — splits at sentence boundaries
  • \n (newline) — splits at paragraph boundaries
  • ; (semicolon) — splits at clause boundaries

    For example, the following sentence is split as follows depending on the delimiter: "This is the first sentence. This is the second sentence."

    Maximum chunk length = 4 , Chunk overlap = 0

    This is the first sentence. This is the second sentence.
    Copy to Clipboard Toggle word wrap

    Maximum chunk length = 4 , Chunk overlap = 0, Delimiter = 0

    This is the first sentence.
    This is the second sentence.
    Copy to Clipboard Toggle word wrap
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top