Skip to main content
Early bird pricing ends April 30th, save your spot

Back to Catalog

Operating Reliable Data in Production with Datadog Data Observability

About this Session

Reliability is a persistent hurdle for data teams, as pipeline failures often go undetected until they hit downstream consumers. In production, a pipeline can look “green” while the outcome is silently wrong: data arrives late, loads incompletely, or drifts in ways that traditional monitors miss. Datadog Data Observability (DO) addresses this by unifying quality and jobs monitoring from production to consumption, helping teams detect and remediate issues faster while optimizing both cost and performance.

 

In this hands-on workshop, you’ll operationalize a data pipeline and its curated datasets in Datadog. You’ll use Jobs Monitoring to troubleshoot failed runs and correlate performance with logs and infrastructure to identify bottlenecks and reduce costs. You will also implement Quality Monitoring with anomaly detection to catch data drift before it impacts production. Using end-to-end Data Lineage, you’ll learn to scope the "blast radius" of an incident and trace it back to the specific job or transformation that introduced it.

 

To close the loop, you’ll connect these data signals to a downstream consumer—an ML service endpoint—to confirm service health and detect behavior shifts. By the end of the session, you’ll know how to run a complete data incident response workflow in Datadog, moving from symptom to root cause faster while ensuring your data remains reliable for any use case.

Related Sessions