Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day
About this Session
When production incidents hit, SREs need to triage alerts, correlate signals across services, and pinpoint root causes quickly. In complex microservice architectures, knowing how to move between observability signals efficiently is critical.
In this hands-on workshop, you'll respond to two production incidents on a microservices-based ecommerce platform. You'll investigate using Real User Monitoring (RUM), Session Replay, Application Performance Monitoring (APM), Error Tracking, Infrastructure Monitoring, Log Management, and Metrics. For each incident, you'll isolate the root cause, execute a remediation, and build proactive monitoring. You'll also use Bits AI SRE to see how AI-powered investigation can accelerate your workflow.
By the end of this workshop, you'll have practical experience using Datadog's core observability tools to investigate, resolve, and monitor for real-world production incidents.