Build with LLM Observability: From Setup to Signal
About this Session
Large Language Models (LLMs) power modern AI applications, but their unpredictable behavior and complex workflows make it difficult to diagnose issues, optimize performance, and understand how they process data. Without visibility into each step of an LLM chain, troubleshooting and improving efficiency can be challenging.
Datadog’s LLM Observability provides visibility into operational performance, helping you ensure the quality, safety, and security of your LLM applications. End-to-end tracing captures input and output, latency metrics, token usage, and errors. By tracing each step in the LLM chain—including embedding, retrieval, and generation—teams can identify the root causes of unexpected outputs, latency, and errors, helping them troubleshoot performance issues and control costs.
In this hands-on workshop, you’ll build a chatbot application with a Retrieval-Augmented Generation (RAG) workflow using the OpenAI Python SDK to make calls to local models. You’ll instrument the application for Datadog’s LLM Observability, using auto-instrumentation and manual in-code setup to collect traces. Then, you’ll analyze these traces to connect application behavior with steps in the LLM chain, identify areas for improvement, apply changes, and observe results.
By the end of this workshop, you’ll have practical experience using Datadog’s observability tools to understand LLM application behavior and improve performance.
Related Sessions
From Ingestion to AI: Ensuring Data Reliability Across the Full Lifecycle
From Reactive to Proactive: How SREs Can Optimize Their Application Services Before Users Are Affected
Speakers
Datadog Core Skills for Developers - Pre-Day
Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day
Serverless Observability on AWS
Detect, Fix, Deploy: AI-Powered Database Optimization
From Ingestion to AI: Ensuring Data Reliability Across the Full Lifecycle
From Reactive to Proactive: How SREs Can Optimize Their Application Services Before Users Are Affected
Speakers
Datadog Core Skills for Developers - Pre-Day
Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day
Serverless Observability on AWS
From Reactive to Proactive: How SREs Can Optimize Their Application Services Before Users Are Affected
Speakers
How AI Is Redefining the Datadog Experience—and How to Make the Most of It
The AI Engineering Playbook: How to Evaluate & Iterate at Every Phase of Development
Datadog Core Skills for Developers - Pre-Day
Datadog Core Skills for Site Reliability Engineers (SREs) - Pre-Day
From Commit to Runtime: Secure Software Delivery and Automated Response with Datadog
Detect, Fix, Deploy: AI-Powered Database Optimization
From Ingestion to AI: Ensuring Data Reliability Across the Full Lifecycle
From Reactive to Proactive: How SREs Can Optimize Their Application Services Before Users Are Affected
Speakers