Home
Learn
AI quickstarts
rh-rh-research

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIA
Copy link

Build an academic research agent on Red Hat AI Factory with NVIDIA, powered by vLLM models and platform capabilities for observability, governance, and scale.

Table of Contents
Copy link

Detailed Description
Requirements
Deploy
Customization
References
Tags

Detailed Description
Copy link

Academic research often requires users to move between quick fact-finding, source discovery, literature review, synthesis, and longer-form reporting. Researchers, faculty, students, and institutional teams may need to gather context from academic papers, web sources, internal knowledge bases, uploaded files, and domain-specific reference material. As the volume of available information grows, the challenge is not only finding relevant sources, but also determining what matters, comparing findings, preserving citation traceability, and turning fragmented information into a useful research output.

This AI quickstart demonstrates how an agentic research application can support academic research workflows by combining fast cited responses with deeper, multi-step investigation. For simple questions, the application can return concise answers with supporting sources. For more complex research requests, it can plan the work, gather information across available tools, ask clarifying questions when needed, and generate a more complete research report. This makes the application useful for research discovery, topic exploration, literature review support, policy research, competitive analysis, and other knowledge-intensive workflows that require both speed and source-backed reasoning.

Built as a customized version of the NVIDIA AI-Q Blueprint for Red Hat AI, this application shows how enterprise-grade research agents can run with NVIDIA models on Red Hat AI Factory with NVIDIA. The AI-Q Blueprint is built on the NVIDIA NeMo Agent Toolkit and LangChain Deep Agents, providing teams with a production-ready foundation for building intelligent research workflows. The quickstart adapts this upstream pattern for Red Hat AI environments and adds enterprise platform capabilities such as scalable model serving, observability, governance, and flexible deployment options, highlighting how teams can bring agentic research workflows into hybrid cloud environments while maintaining the operational control needed for production AI applications.

Architecture Diagrams
Copy link

AI-Q Architecture on Red Hat AI

This architecture diagram shows a customized NVIDIA AI-Q research workflow running on Red Hat AI Factory with NVIDIA. AI-Q routes user requests across different research paths, from simple responses to shallow, tool-augmented research and deeper multi-step research with planning, sub-agents, and report generation.

The workflow is backed by a small set of shared model endpoints rather than one model per agent component. In this quickstart, models can be served with vLLM on Red Hat AI Enterprise or accessed through NVIDIA NGC cloud inference. The application can also connect to web search, academic search, uploaded enterprise data, and a RAG knowledge layer to support cited, source-grounded responses.

Red Hat AI Enterprise adds the platform capabilities needed to operate the application in production-like environments, including scalable model serving, observability, governance, and hybrid cloud deployment flexibility. The diagram represents the AI-Q workflow and supporting services, while the red callouts highlight Red Hat AI Enterprise additions such as vLLM-based serving and observability.

Requirements
Copy link

Minimum Hardware Requirements
Copy link

GPU Requirements (for local vLLM deployment)
Copy link

This deployment uses quantized and smaller-sized models for efficient GPU memory usage in addition to leveraging optional MIG configuration for added GPU optimization. These requirements are when models are deployed locally on your GPUs using vLLM (not using NGC cloud inference).

Models deployed on your cluster:

RedHatAI/gpt-oss-120b (Orchestrator): ~80GB VRAM (quantized)
RedHatAI/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 (Intent & Researcher): ~25-30GB VRAM (quantized)
nvidia/Nemotron-Mini-4B-Instruct (Summary): ~8-10GB VRAM

Standard deployment requirements (full GPUs, not using MIG):

3x NVIDIA H100 (80GB) or A100 80GB
- GPU 0: gpt-oss-120b (orchestrator) - 1 GPU (~80GB)
- GPU 1: nemotron-nano-30b (intent & researcher) - 1 GPU (~30GB)
- GPU 2: nemotron-mini-4b (summary) - 1 GPU (~10GB)

Optional: Multi-Instance GPU (MIG) optimization

MIG allows you to partition GPUs into smaller slices, enabling multiple models to share a single GPU efficiently and reduce overall GPU requirements.

NOTE: MIG examples are based on H100 MIG profiles

With MIG (all-balanced profile): 2x H100 GPUs minimum
- GPU 0: 2x 3g.47gb (gpt-oss-120b with tensor parallelism across 2 slices)
- GPU 1: 1x 3g.47gb (nemotron-nano-30b) + 1x 1g.12gb (nemotron-mini-4b)

See deploy/helm/vllm-models/values.yaml for detailed MIG configuration examples and options.

Alternative: NGC API Cloud Deployment (No GPU Required)

When using NVIDIA NGC API for cloud-hosted inference, no local GPU resources are required. This is the quickest way to get started and test AI-Q.

Storage
Copy link

Based on default deployment configuration (deploy/helm/aiq-rh/values.yaml):

PostgreSQL PersistentVolumeClaim: 10GB
- Single PVC (aiq-postgres-data) for job metadata, agent checkpoints, and research summaries
ChromaDB and application data: Uses ephemeral storage (emptyDir)
- Data does not persist across pod restarts in default configuration
- To persist ChromaDB vectors and documents, add a PVC for the backend's /app/data volume mount
Container images: Standard container registry pull and caching (size varies by deployment target)

Minimum recommended: Ensure adequate node storage for PVCs plus container image caching

Minimum Software Requirements
Copy link

Red Hat OpenShift Container Platform (tested with v4.20)
Red Hat OpenShift AI v3.3.2+ (tested with v3.3.2)
NVIDIA GPU Operator v24.6.0+
Helm CLI
OpenShift Client CLI (oc)

Required User Permissions
Copy link

cluster-admin or namespace admin permissions for creating resources in your target namespace
Ability to create PersistentVolumeClaims
Ability to create Secrets
For vLLM deployment: Permissions to create KServe InferenceServices

Deploy
Copy link

The following instructions will deploy the Red Hat Research AI quickstart to your Red Hat AI Enterprise environment using simple Helm deployments.

Prerequisites
Copy link

Before deployment, ensure you have the following in place:

OpenShift cluster with OpenShift AI installed (see version requirements above)
OpenShift AI has a DataScienceCluster resource with kserve and dashboard components set to managed
For vLLM deployment: GPU nodes available with NVIDIA GPU Operator installed
For NGC deployment: No GPU infrastructure required

Obtain the following API keys:

NVIDIA_API_KEY (required for NGC model deployment)
- Get your API key at: https://org.ngc.nvidia.com/setup/api-key
- Sign up for NIM access at: https://build.nvidia.com/
TAVILY_API_KEY (optional but recommended for web search functionality)
- Sign up at: https://tavily.com/
SERPER_API_KEY (optional for academic paper search via Google Scholar)
- Sign up at: https://serper.dev/

Note: At least one data source (Tavily web search, Serper paper search, or uploaded documents) is required to enable research functionality beyond basic conversational queries.

Install
Copy link

Clone the AI quickstart repository, and git checkout the quickstart deployment branch:

git clone https://github.com/rh-ai-quickstart/rh-research
cd rh-research
git checkout quickstart

# Initialize submodules (if building custom images or wanting to review source code)
git submodule update --init --recursive

Note: The submodule initialization step is only required if you plan to build custom container images from source. The pre-built images work without submodules.

Ensure you are logged into your OpenShift cluster as cluster-admin or namespace admin:

oc whoami

Set environment variables for API keys:

# NVIDIA API key (required for NGC models, optional for vLLM model pulls)
export NVIDIA_API_KEY="nvapi-..."

# Tavily API key for web search (optional but recommended)
export TAVILY_API_KEY="tvly-..."

# Serper API key for paper search (optional)
export SERPER_API_KEY="..."

Create namespace and secrets:

# Create namespace
oc create namespace ns-aiq

# Create application secrets
oc create secret generic aiq-credentials -n ns-aiq \
  --from-literal=NVIDIA_API_KEY="$NVIDIA_API_KEY" \
  --from-literal=TAVILY_API_KEY="$TAVILY_API_KEY" \
  --from-literal=SERPER_API_KEY="$SERPER_API_KEY" \
  --from-literal=DB_USER_NAME="aiq" \
  --from-literal=DB_USER_PASSWORD="aiq_dev"

# For NGC-based deployments, create image pull secret
oc create secret docker-registry ngc-api -n ns-aiq \
  --docker-server=nvcr.io \
  --docker-username='$oauthtoken' \
  --docker-password="$NVIDIA_API_KEY"

Choose your deployment option:

AI quickstart decision tree:

Do you have GPU infrastructure?
├─ NO  → Option B: NGC Cloud Models (easy onramp, no GPU needed)
└─ YES → Do you want to run models locally?
          ├─ YES → Option A: vLLM Local Models (recommended for production)
          └─ NO  → Option B: NGC Cloud Models

Option A: vLLM Local Models

Deploy models locally on your GPUs for full deployment control and integration with the Red Hat AI Enterprise observability stack.

cd deploy/helm

# Step 1: Deploy vLLM models via KServe
helm install vllm-models vllm-models/ \
  -n ns-aiq

# Wait for InferenceServices to be ready (2-5 minutes for model downloads)
oc get inferenceservices -n ns-aiq -w

# Step 2: Deploy AI-Q application with vLLM configuration and Red Hat branding
helm install aiq aiq-rh/ \
  -n ns-aiq \
  -f aiq-rh/values-vllm.yaml \
  -f aiq-rh/values-branding.yaml

# Verify deployment
oc get pods -n ns-aiq

What you get:

LLM inference via local vLLM servers on your GPUs
Embedded LlamaIndex with ChromaDB for document storage
Full control over model selection and hosting
Data stays within your cluster
Red Hat branded UI with custom favicon

Option B: NGC Cloud Models

Use NVIDIA's cloud-hosted model inference without GPU infrastructure.

cd deploy/helm

# Deploy AI-Q with default NGC configuration and Red Hat branding
helm install aiq aiq-rh/ \
  -n ns-aiq \
  -f aiq-rh/values-branding.yaml

# Verify deployment
oc get pods -n ns-aiq

What you get:

LLM inference via NGC API (cloud-hosted, pay-per-use)
Embedded LlamaIndex with ChromaDB for document storage
No GPU infrastructure needed
Fastest way to get started
Red Hat branded UI with custom favicon

Advanced Options:

The NVIDIA AI-Q Blueprint is designed to work optionally with the NVIDIA RAG Blueprint as a RAG backend. We have published an AI quickstart based on this RAG blueprint, similarly customized for Red Hat AI Enterprise deployments, that may be used with this research assistant AI quickstart.

RAG AI quickstart based on NVIDIA RAG Blueprint

To integrate with the RAG quickstart, see the full deployment guide for the following configuration options:

Option C: vLLM + RAG Blueprint (aiq-rh/values-vllm-frag.yaml)
Option D: NGC + RAG Blueprint (aiq-rh/values-frag.yaml)

See Deployment Guide for complete instructions.

Verify Installation
Copy link

Check all deployed pods are running:

oc get pods -n ns-aiq

Expected pods (all deployments):

aiq-backend-* - Main application backend
aiq-frontend-* - Web UI
aiq-postgres-* - PostgreSQL database

Additional pods (vLLM deployment only):

gpt-oss-120b-predictor-* - Orchestrator model server
nemotron-nano-30b-predictor-* - Intent & researcher model server
nemotron-mini-4b-predictor-* - Summary model server

Using the research assistant AI quickstart
Copy link

Get the frontend URL:

echo "https://$(oc get route -n ns-aiq aiq-frontend -o jsonpath='{.spec.host}')"

Navigate to the frontend UI in your browser
Test the agent with different query types:

Simple greeting (meta response - instant):

Hello, what can you do?

Expected: Friendly greeting explaining AI-Q capabilities within 2-5 seconds.

Shallow research (quick research with citations - 10-30 seconds):

What is Red Hat OpenShift?

Expected: Factual answer with web search citations within 10-30 seconds.

Deep research (comprehensive analysis - 2-5 minutes):

Provide a comprehensive analysis of Kubernetes security best practices

Expected: Multi-section structured report with planning steps, research progress updates, and comprehensive citations. Overall end-to-end processing time varies.

(Optional) Upload documents for knowledge retrieval:

Click the upload button to add PDF, DOCX, Markdown or TXT files. Once uploaded, the agent can answer questions based on your document content:

What information is in the document I uploaded?

Expected: Answer synthesized from your uploaded documents with citations to specific sections.

For detailed verification steps and troubleshooting, see the User Verification Guide.

Delete
Copy link

Uninstall the quickstart deployment:

# Delete AI-Q application
helm uninstall aiq -n ns-aiq

# For vLLM deployments, delete model servers
helm uninstall vllm-models -n ns-aiq

# Delete all PVCs to remove data
oc delete pvc --all -n ns-aiq

# (Optional) Delete the entire namespace
oc delete namespace ns-aiq

Customization
Copy link

This quickstart focuses on deploying AI-Q on Red Hat OpenShift AI using pre-built container images. For customization options:

Quick Configuration Changes
Copy link

UI Branding: Pre-built images include Red Hat branding by default. The main install commands use values-branding.yaml to add a custom favicon and demonstrate runtime branding customization. You may edit this file to change colors, logos, or text for custom demos without rebuilding images.

See Customization Reference for branding details.
Model Selection: Edit deploy/helm/vllm-models/values.yaml to change vLLM models
Agent Behavior: Modify inline ConfigMaps in values files (e.g., aiq-rh/values-vllm.yaml)
Data Sources: Configure API keys via the aiq-credentials secret
RAG Integration: Update RAG_SERVER_URL and RAG_INGEST_URL environment variables

Building from Source
Copy link

Pre-built container images include Red Hat-specific patches applied to the upstream AI-Q v2.1.0 source. To build custom images with your own modifications, see Customization Reference for:

Patch workflow and application
Building custom frontend/backend images
Model selection and configuration
Agent behavior tuning

The customization guide provides step-by-step instructions for working with the source code and patches.

Container Images & Versioning
Copy link

This quickstart is based on NVIDIA AI-Q Blueprint v2.1.0 with Red Hat-specific patches. The deployment uses pre-built container images:

Backend: quay.io/tasmith/aiq-backend-redhat:v2.1.0-nv
NVIDIA AI-Q v2.1.0
Frontend: quay.io/tasmith/aiq-frontend-redhat:2.1.0
NVIDIA AI-Q v2.1.0 + patches 0002-0003 (runtime branding + Red Hat defaults)

Patches are maintained in patches/aiq/ and applied during the container build process. See Customization Reference for patch details and build instructions.

Additional Resources:

Deployment Guide - All four deployment options (vLLM, NGC, RAG AI quickstart)
Configuration Reference - YAML parameter reference for advanced configuration

References
Copy link

NVIDIA NeMo Agent Toolkit - Framework for building production-ready AI agents
NVIDIA AI-Q Blueprint - Upstream project repository
LangChain Deep Agents - Multi-agent orchestration framework
vLLM - High-throughput and memory-efficient inference engine for LLMs
NVIDIA Nemotron - Family of open models with open weights optimized for specialized AI agents
NVIDIA RAG Blueprint - Enterprise RAG infrastructure
Red Hat AI Quickstarts - Collection of AI blueprints for Red Hat AI

License
Copy link

This AI quickstart is based on the NVIDIA AI-Q Blueprint, which is licensed under the Apache License 2.0. This repository contains Red Hat-specific customizations and deployment configurations for the upstream AI-Q project.

AI-Q Project License: See licenses/LICENSE for the Apache License 2.0 text
Third-Party Dependencies: See licenses/LICENSE-THIRD-PARTY for all third-party software licenses
Deployment and Patch Code: See LICENSE for license content related to the custom code within this repository.

Note: This is not the official NVIDIA AI-Q Blueprint repository. For the upstream project, see NVIDIA-AI-Blueprints/aiq.

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIA

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIA
Copy link

Table of Contents
Copy link

Detailed Description
Copy link

Architecture Diagrams
Copy link

Requirements
Copy link

Minimum Hardware Requirements
Copy link

GPU Requirements (for local vLLM deployment)
Copy link

Storage
Copy link

Minimum Software Requirements
Copy link

Required User Permissions
Copy link

Deploy
Copy link

Prerequisites
Copy link

Install
Copy link

Verify Installation
Copy link

Using the research assistant AI quickstart
Copy link

Delete
Copy link

Customization
Copy link

Quick Configuration Changes
Copy link

Building from Source
Copy link

Container Images & Versioning
Copy link

References
Copy link

License
Copy link

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIA

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIACopy linkLink copied!

Table of ContentsCopy linkLink copied!

Detailed DescriptionCopy linkLink copied!

Architecture DiagramsCopy linkLink copied!

RequirementsCopy linkLink copied!

Minimum Hardware RequirementsCopy linkLink copied!

GPU Requirements (for local vLLM deployment)Copy linkLink copied!

StorageCopy linkLink copied!

Minimum Software RequirementsCopy linkLink copied!

Required User PermissionsCopy linkLink copied!

DeployCopy linkLink copied!

PrerequisitesCopy linkLink copied!

InstallCopy linkLink copied!

Verify InstallationCopy linkLink copied!

Using the research assistant AI quickstartCopy linkLink copied!

DeleteCopy linkLink copied!

CustomizationCopy linkLink copied!

Quick Configuration ChangesCopy linkLink copied!

Building from SourceCopy linkLink copied!

Container Images & VersioningCopy linkLink copied!

ReferencesCopy linkLink copied!

LicenseCopy linkLink copied!

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Deploy Multi-Agent Research Workflows with Red Hat AI and NVIDIA
Copy link

Table of Contents
Copy link

Detailed Description
Copy link

Architecture Diagrams
Copy link

Requirements
Copy link

Minimum Hardware Requirements
Copy link

GPU Requirements (for local vLLM deployment)
Copy link

Storage
Copy link

Minimum Software Requirements
Copy link

Required User Permissions
Copy link

Deploy
Copy link

Prerequisites
Copy link

Install
Copy link

Verify Installation
Copy link

Using the research assistant AI quickstart
Copy link

Delete
Copy link

Customization
Copy link

Quick Configuration Changes
Copy link

Building from Source
Copy link

Container Images & Versioning
Copy link

References
Copy link

License
Copy link