Continuous Profiling, the OpenTelemetry Way (Grafana ❀️πŸ”₯ OpenTelemetry Community Call #7)

In this community call, my coworker Tiffany Jernigan and I sat down with Christian Simon, who works on Grafana Pyroscope, to dig into profiling as the fourth signal in OpenTelemetry. Metrics, logs, and traces are the classic three pillars β€” but a significant portion of latency lives inside a span, in the actual code execution, where the other three can’t see. Profiling fills that gap by giving you continuous, code-level visibility into where time is being spent.

What we covered

πŸ”₯ Profiling as the fourth signal

Most observability conversations stop at logs, metrics, and traces. But when you see a slow span and ask “what was the code actually doing?” β€” none of the three pillars can answer. Christian made the case that profiling is the missing piece: continuous, low-overhead, code-level visibility, typically visualized as flame graphs that attribute latency to specific functions and execution paths.

πŸ“Š What flame graphs actually tell you

If you haven’t worked with flame graphs before: wide bars = lots of time spent in that function, and the stack reads bottom-up. Once you understand how to read them, they become surprisingly fast to interpret. We walked through what to look for and how to spot the patterns that matter.

🌐 The current state of profiling in OpenTelemetry

Profiling is the newest signal to enter the OpenTelemetry spec, and the conversation around it is still evolving. Christian gave a clear-eyed status update on where things are β€” what’s stable, what’s experimental, and what the path to GA looks like.

πŸš€ Running profiling in production

The most common pushback on continuous profiling is “what’s the overhead?” β€” and the honest answer is that modern continuous profilers typically add 1–3% CPU overhead, which makes them safe to run in production all the time. We talked through deployment patterns, including Kubernetes / containerized environments, and how to think about cost and storage.

🧡 Correlating profiles with traces and metrics

This is where profiling really earns its place. Trace ID β†’ profile is the killer pattern: you see a slow span, you click through to the flame graph for that exact execution, and you immediately see which function ate the time. Christian showed how this correlation works in practice and how it changes incident response.

🀝 How to get involved

OpenTelemetry profiling is genuinely an area where the community has room to shape the future. If the conversation sparked something, there’s a working group, open issues, and a friendly path in.

Resources

Thanks to Christian for joining, to Tiffany for co-hosting, and to everyone who showed up live with questions.

See Also