From Reactive to Proactive: How SREs Can Optimize Their Application Services Before Users Are Affected
Speakers
Engineering teams are building AI agents capable of correlating across millions of data points, executing tool calls, and maintaining state through complex reasoning. However, the non-deterministic nature of agents makes it difficult to understand quality and explain variance across outcomes. Evals are how teams make agent performance measurable in a repeatable way, moving beyond the final answer to evaluate the entire chain of reasoning.
In the high-stakes world of incident response, agent performance isn't abstract; it's whether tools can reliably accelerate triage, produce accurate RCAs, and guide remediation during an engineer's most pressurized moments. In this session, we'll explore how Datadog uses evaluations as the core engineering loop to objectively measure Bits AI SRE and what we learned building that system. Whether you're building your own AI agents or using Datadog Bits AI SRE, you'll leave understanding how rigorous evals drive better reasoning on the problems that matter most.
Speakers