All About AI Observability (The Context Window #03)

2026-Apr-22

ai , observability , english , livestream , grafana labs , the context window , ai observability , evaluations , sigil

In this episode of The Context Window — Grafana’s livestream series on AI in observability — I’m joined by Tiffany Jernigan and the engineers behind the product, Alexander Sniffin and Jack Gordley, for a deep dive into AI Observability in Grafana Cloud. We unpack what it actually is (a new way to instrument your AI apps and see canonical data about them, sitting alongside your existing telemetry), what an evaluation is in this context (and the difference between online evals on live traffic and offline evals on dataset conversations), and we walk through real demos: setting up AI Observability and evaluators, the analytics view, instrumenting a local coding agent, and Jack’s system prompt analysis tool. We also talk about the things nobody else is talking about — like LLM-as-judge as a method, evaluators that catch the most bugs, and the cheating-LLM phenomenon.

Timestamps

00:00:00 — Introductions
00:02:00 — The last month in AI news
00:11:09 — What is AI Observability in Grafana Cloud?
00:19:20 — What is an evaluation?
00:21:05 — The origin of AI Observability
00:25:09 — Demo: Setting up AI Observability and evaluators
00:32:04 — What is LLM as judge?
00:38:43 — Demo: AI Observability Analytics
00:40:52 — AI O11y is based on OpenTelemetry
00:42:17 — Demo: Instrumenting a local coding agent
00:47:18 — Potential future agentic use cases
00:52:00 — Evaluators that catch the most bugs
01:02:23 — Demo: System prompt analysis
01:05:11 — Guess the prompt

Nicole van der Hoeven

All About AI Observability (The Context Window #03)

Timestamps

Resources

News from the episode

AI Observability

Mentioned

See Also

Timestamps

Resources

News from the episode

AI Observability

Mentioned

Related reading

See Also