Evaluate and Optimize AI Agent Performance

About this Session

Large Language Model (LLM) applications are nondeterministic: their outputs are never exactly the same. Traditional testing methods and operational metrics are not sufficient to measure output quality, accuracy, or safety in agentic workflows. When you change a prompt, model, or your application’s architecture, how do you know if the changes actually made things better?

In this workshop, you'll learn how to use Datadog LLM Observability's Experiments and Evaluations features to systematically measure and improve the quality of your agentic AI applications. Evaluations enable you to measure quality in production, and Experiments help you validate changes offline using real production traces. Together, they enable deliberate, data-driven improvement instead of relying on guesswork or trial and error.

Through a hands-on lab, you'll walk through a full development loop: identifying a quality issue, creating an evaluator that defines what "good" means for your use case, running an experiment to compare variations, and confirming improvements with runtime evaluations and monitors. By the end of the workshop, you'll have the skills to build a continuous feedback loop between production monitoring and pre-deployment testing, so you can ship improvements with confidence.

Related Sessions

End-to-End Observability

Breakout Session

all levels

The Hidden Data Pipelines Behind Datadog: Lessons from Building Observability for Our Own Data Teams

June 9 09:00 AM – 09:40 AM

End-to-End Observability

Breakout Session

all levels

Turning User Behavior into a Better Customer Experience

June 9 09:00 AM – 09:40 AM

Speakers

Rob Taylor, Director, Product Management, SAS

Perry Thomas, Sr. Director of Developer Experience, SAS

End-to-End Observability

Harnessing AI

Workshop

Build with LLM Observability: From Setup to Signal - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Developer Autonomy

Datadog Core Skills for Developers - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Scaling Systems

Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Workshop

intermediate

Serverless Observability on AWS

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Breakout Session

all levels

The Hidden Data Pipelines Behind Datadog: Lessons from Building Observability for Our Own Data Teams

June 9 09:00 AM – 09:40 AM

End-to-End Observability

Breakout Session

all levels

Turning User Behavior into a Better Customer Experience

June 9 09:00 AM – 09:40 AM

Speakers

Rob Taylor, Director, Product Management, SAS

Perry Thomas, Sr. Director of Developer Experience, SAS

End-to-End Observability

Harnessing AI

Workshop

Build with LLM Observability: From Setup to Signal - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Developer Autonomy

Datadog Core Skills for Developers - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Scaling Systems

Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Workshop

intermediate

Serverless Observability on AWS

June 8 01:30 PM – 04:00 PM

Security & Compliance

Harnessing AI

Breakout Session

BewAIre: Detecting Malicious Pull Requests at Scale with LLMs

June 9 01:00 PM – 01:40 PM

Speakers

D Niu, Senior Software Engineer, Datadog

Kassen Qian, Senior Product Manager, Datadog

Harnessing AI

Breakout Session

all levels

LLM Observability at Scale: Governing, Monitoring, and Securing AI Agents in Production

June 9 09:00 AM – 09:40 AM

Speakers

Rodrigo Moreno, Head of Cloud & SRE, Banco BV

Flávia Sacramoni, Head of Command Center and IT Services Management, Banco BV

Harnessing AI

Fireside Chat

all levels

The New Shape of Engineering

June 9 01:00 PM – 01:40 PM

Speakers

Alexis Lê-Quôc, CTO & Co-Founder, Datadog

Thibault Sottiaux, Head of Codex, OpenAI

End-to-End Observability

Harnessing AI

Workshop

Build with LLM Observability: From Setup to Signal - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Developer Autonomy

Datadog Core Skills for Developers - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Scaling Systems

Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Breakout Session

all levels

The Hidden Data Pipelines Behind Datadog: Lessons from Building Observability for Our Own Data Teams

June 9 09:00 AM – 09:40 AM

End-to-End Observability

Breakout Session

all levels

Turning User Behavior into a Better Customer Experience

June 9 09:00 AM – 09:40 AM

Speakers

Rob Taylor, Director, Product Management, SAS

Perry Thomas, Sr. Director of Developer Experience, SAS

End-to-End Observability

Harnessing AI

Workshop

Build with LLM Observability: From Setup to Signal - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Developer Autonomy

Datadog Core Skills for Developers - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Harnessing AI

Scaling Systems

Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day

June 8 01:30 PM – 04:00 PM

End-to-End Observability

Workshop

intermediate

Serverless Observability on AWS

June 8 01:30 PM – 04:00 PM