Home
Learn
AI quickstarts
rh-openshift-ai-observability-summarizer

Summarize and analyze your observability data
Copy link

Create an interactive dashboard to analyze AI model performance and OpenShift cluster metrics using Prometheus.

Table of Contents
Copy link

Detailed description
Requirements
Deploy
References
Tags

Detailed description
Copy link

The OpenShift AI Observability Summarizer turns OpenShift + OpenShift AI observability signals into plain-English, actionable insights. Instead of stitching together dashboards and raw metrics, teams can quickly understand performance, cost drivers, and operational risks across AI workloads and the platform that runs them.

The Challenge
Copy link

AI platform and SRE teams routinely need to answer questions like:

“Is my model latency increasing? What changed?”
“Are GPUs saturated or underutilized?”
“Which namespace/workload is causing resource pressure?”
“Do logs/traces correlate with the metric spike?”

Today, these answers often require jumping between tools, writing queries, correlating signals across systems, and then producing a shareable narrative for stakeholders.

Our Solution
Copy link

Provide a metrics-aware AI summarization layer on top of OpenShift observability:

Curated + dynamically validated metrics catalog (OpenShift + GPU) used for accurate query selection
Natural-language Chat with Prometheus that generates queries and explains results
Dashboards for OpenShift fleet metrics, vLLM metrics, and hardware accelerators
Report generation (HTML/PDF/Markdown) for sharing
Optional alerting integrations (Slack notifications) when enabled

Features
Copy link

Console-native experience: Works where platform teams already operate—inside the OpenShift Console.
Faster time-to-answer: Turn “what changed?” questions into clear, structured summaries in minutes.
Fleet + namespace insights: Move from cluster-wide trends to namespace-level signals without context switching.
GPU & model visibility: Understand accelerator utilization and model-serving performance using the same workflow.
Explainable outputs: Summaries come with supporting details so engineers can validate conclusions quickly.
Shareable reporting: Export clean HTML/PDF/Markdown reports for stakeholders and incident reviews.

Current Manual Process
Copy link

Without this project, teams typically:

Manually browse dashboards and write queries
Manually identify the “right” metrics among thousands of candidates
Copy/paste charts and raw values into a document
Correlate symptoms across:
- metrics (Prometheus/Thanos)
- GPU telemetry (DCGM exporter)
- traces (Tempo)
- logs (Loki)
Recreate the same analysis for each incident, model rollout, or performance regression

Our Solution Stack
Copy link

Data sources

Prometheus / Thanos: OpenShift and workload metrics
vLLM: model-serving metrics (when vLLM is used)
DCGM exporter (optional): GPU metrics (temperature, power, utilization, memory)

Observability add-ons (optional / stack-managed)

OpenTelemetry Collector: collects traces and enables auto-instrumentation flows
Tempo: trace storage/query (integrated into OpenShift Console via UI plugin)
Loki: log aggregation/query (integrated into OpenShift Console via UI plugin)
MinIO: object storage backend for traces/logs persistence

Application components

MCP server: metrics analysis, report generation, and AI tool-calling surface
LLM runtime: local model deployment by default (or connect to an existing model via LLM_URL)
UI
- OpenShift Console Plugin (default, DEV_MODE=false)
- Standalone React UI (DEV_MODE=true)

Architecture diagrams
Copy link

Architecture

Simple flow
Copy link

Access the AI Observability experience in OpenShift—either embedded in the OpenShift Console or via the standalone UI.
Choose your analysis context (cluster, namespace, or model view and a time range), or ask a question using the chat interface.
The platform gathers the relevant observability signals from your environment, covering both the current state and how it has changed over time.
It synthesizes those signals into actionable insights, highlighting what appears healthy or unhealthy, the most likely contributors, and recommended next checks.
Share outcomes easily by exporting the findings as a report when needed.

Requirements
Copy link

Minimum hardware requirements
Copy link

CPU: 4 cores (8 recommended)
Memory: 8 GiB RAM (16 GiB recommended)
Storage: 20 GiB (50 GiB recommended)
GPU: optional (recommended for DCGM + model workloads)

Minimum software requirements
Copy link

OpenShift: 4.18.33+
OpenShift AI: 2.16.2+
CLI tools:
- oc
- helm v3.x
- yq
- jq (used by Makefile flows)

Required user permissions
Copy link

OpenShift: cluster-admin (or equivalent privileges for installing console plugins, cluster-wide observability components, and required RBAC).

Deploy
Copy link

Quick Start - OpenShift Deployment
Copy link

Default (production-style): OpenShift Console Plugin UI

make install NAMESPACE=your-namespace

Want to install with existing LLMs?

No Hugging Face / no local model download (use an already-running model endpoint):

make install NAMESPACE=your-namespace LLM_URL=http://your-llm-endpoint

For developers - Standalone (development/standalone): React UI route

make install NAMESPACE=your-namespace DEV_MODE=true

Access

Console Plugin (default): OpenShift Console → left navigation → AI Observability
React UI (DEV_MODE=true): OpenShift Console → Networking → Routes (route name typically aiobs-react-ui)

Optional console menus

Traces menu: make enable-tracing-ui
Logs menu: make enable-logging-ui

Quick Start - Local Development
Copy link

Use the local dev helper to port-forward dependencies and run local components.

uv sync
./scripts/local-dev.sh -n your-namespace

Usage
Copy link

Enable AI Assistance
Copy link

Navigate to settings and configure a model:

Settings page

You can either set an API_KEY or add a custom model. Supported providers include OpenAI, Gemini, Anthropic, and Meta.

API page

Model page

Once the model configuration is set, select your model from the dropdown:

Dropdown page

OpenShift Console Plugin (default)
Copy link

Open the OpenShift Console and navigate to AI Observability from the left navigation.

Overview page

vLLM metrics
Copy link

Use this page to understand model-serving performance when vLLM metrics are present.

vLLM Metrics page

Hardware Accelerator
Copy link

Use this page to review accelerator-related signals (for example GPU utilization/health when available).

Hardware Metrics page

OpenShift metrics
Copy link

Use this page to analyze cluster-wide and namespace-scoped OpenShift metrics.

OpenShift Metrics page

Chat with Prometheus
Copy link

Ask questions in natural language and get a query + explanation back.

Chat page

You can also navigate to suggested metrics and choose from the questions there -

Metric page

Question page

Reports
Copy link

Export reports when needed (HTML/PDF/Markdown).

Observe → Traces / Observe → Logs (optional)
Copy link

If enabled in your cluster, use:

Observe → Traces to view traces
Observe → Logs to query logs

Delete
Copy link

Uninstall the deployment from the namespace:

make uninstall NAMESPACE=your-namespace

References
Copy link

Uses Prometheus and Thanos
Uses Tempo for traces
Uses Loki for logs
Uses vLLM for model serving (when applicable)
Integrates with OpenTelemetry for distributed tracing and observability

Summarize and analyze your observability data

Summarize and analyze your observability data
Copy link

Table of Contents
Copy link

Detailed description
Copy link

The Challenge
Copy link

Our Solution
Copy link

Features
Copy link

Current Manual Process
Copy link

Our Solution Stack
Copy link

Architecture diagrams
Copy link

Simple flow
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Quick Start - OpenShift Deployment
Copy link

Quick Start - Local Development
Copy link

Usage
Copy link

Enable AI Assistance
Copy link

OpenShift Console Plugin (default)
Copy link

vLLM metrics
Copy link

Hardware Accelerator
Copy link

OpenShift metrics
Copy link

Chat with Prometheus
Copy link

Reports
Copy link

Observe → Traces / Observe → Logs (optional)
Copy link

Delete
Copy link

References
Copy link

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Summarize and analyze your observability data

Summarize and analyze your observability dataCopy linkLink copied!

Table of ContentsCopy linkLink copied!

Detailed descriptionCopy linkLink copied!

The ChallengeCopy linkLink copied!

Our SolutionCopy linkLink copied!

FeaturesCopy linkLink copied!

Current Manual ProcessCopy linkLink copied!

Our Solution StackCopy linkLink copied!

Architecture diagramsCopy linkLink copied!

Simple flowCopy linkLink copied!

RequirementsCopy linkLink copied!

Minimum hardware requirementsCopy linkLink copied!

Minimum software requirementsCopy linkLink copied!

Required user permissionsCopy linkLink copied!

DeployCopy linkLink copied!

Quick Start - OpenShift DeploymentCopy linkLink copied!

Quick Start - Local DevelopmentCopy linkLink copied!

UsageCopy linkLink copied!

Enable AI AssistanceCopy linkLink copied!

OpenShift Console Plugin (default)Copy linkLink copied!

vLLM metricsCopy linkLink copied!

Hardware AcceleratorCopy linkLink copied!

OpenShift metricsCopy linkLink copied!

Chat with PrometheusCopy linkLink copied!

ReportsCopy linkLink copied!

Observe → Traces / Observe → Logs (optional)Copy linkLink copied!

DeleteCopy linkLink copied!

ReferencesCopy linkLink copied!

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Summarize and analyze your observability data
Copy link

Table of Contents
Copy link

Detailed description
Copy link

The Challenge
Copy link

Our Solution
Copy link

Features
Copy link

Current Manual Process
Copy link

Our Solution Stack
Copy link

Architecture diagrams
Copy link

Simple flow
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Quick Start - OpenShift Deployment
Copy link

Quick Start - Local Development
Copy link

Usage
Copy link

Enable AI Assistance
Copy link

OpenShift Console Plugin (default)
Copy link

vLLM metrics
Copy link

Hardware Accelerator
Copy link

OpenShift metrics
Copy link

Chat with Prometheus
Copy link

Reports
Copy link

Observe → Traces / Observe → Logs (optional)
Copy link

Delete
Copy link

References
Copy link