Tracing
Track agent behavior and evaluate performance in real-time with OpenTelemetry-based tracing.
Tracing provides comprehensive observability for your AI agents, automatically capturing execution traces, spans, and performance metrics. All tracing is built on OpenTelemetry standards, so you can monitor agent behavior regardless of implementation language.
Quickstart
Initialize the Tracer
Set up your tracer with your project configuration:
from judgeval.tracer import Tracer, wrap
from openai import OpenAI
# Initialize tracer (singleton pattern - only one instance per agent, even for multi-file agents)
judgment = Tracer(project_name="default_project")
# Auto-trace LLM calls - supports OpenAI, Anthropic, Together, Google GenAI, and Groq
client = wrap(OpenAI())
Instrument Your Agent
Add tracing decorators to capture agent behavior:
class QAAgent:
def __init__(self, client):
self.client = client
@judgment.observe(span_type="tool")
def process_query(self, query):
response = self.client.chat.completions.create(
model="gpt-5-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": f"I have a query: {query}"}
]
) # Automatically traced
return f"Response: {response.choices[0].message.content}"
@judgment.agent()
@judgment.observe(span_type="function")
def invoke_agent(self, query):
result = self.process_query(query)
return result
if __name__ == "__main__":
agent = QAAgent(client)
print(agent.invoke_agent("What is the capital of the United States?"))
Key Components:
@judgment.observe()
captures tool interactions, inputs, outputs, and execution timewrap()
automatically tracks all LLM API calls including token usage and costs@judgment.agent()
identifies which agent is responsible for each tool call in multi-agent systems
All traced data flows to the Judgment platform in real-time with zero latency impact on your application.
What Gets Captured
The Tracer automatically captures comprehensive execution data:
- Execution Flow: Function call hierarchy, execution duration, and parent-child span relationships
- LLM Interactions: Model parameters, prompts, responses, token usage, and cost per API call
- Agent Behavior: Tool usage, function inputs/outputs, state changes, and error states
- Performance Metrics: Latency per span, total execution time, and cost tracking
OpenTelemetry Integration
Judgment's tracing is built on OpenTelemetry, the industry-standard observability framework. This means:
Standards Compliance:
- Compatible with existing OpenTelemetry tooling
- Follows OTEL semantic conventions
- Integrates with OTEL collectors and exporters
Advanced Configuration:
You can integrate Judgment's tracer with your existing OpenTelemetry setup:
from judgeval.tracer import Tracer
from opentelemetry.sdk.trace import TracerProvider
tracer_provider = TracerProvider()
# Initialize with OpenTelemetry resource attributes
judgment = Tracer(
project_name="default_project",
resource_attributes={
"service.name": "my-ai-agent",
"service.version": "1.2.0",
"deployment.environment": "production"
}
)
# Connect to your existing OTEL infrastructure
tracer_provider.add_span_processor(judgment.get_processor())
tracer = tracer_provider.get_tracer(__name__)
# Use native OTEL spans alongside Judgment decorators
def process_request(question: str) -> str:
with tracer.start_as_current_span("process_request_span") as span:
span.set_attribute("input", question)
answer = answer_question(question)
span.set_attribute("output", answer)
return answer
Resource Attributes:
Resource attributes describe the entity producing telemetry data. Common attributes include:
service.name
- Name of your serviceservice.version
- Version numberdeployment.environment
- Environment (production, staging, etc.)service.namespace
- Logical grouping
See the OpenTelemetry Resource specification for standard attributes.
Multi-Agent System Tracing
Track which agent is responsible for each tool call in complex multi-agent systems.
Example Multi-Agent System
from planning_agent import PlanningAgent
if __name__ == "__main__":
planning_agent = PlanningAgent("planner-1")
goal = "Build a multi-agent system"
result = planning_agent.invoke_agent(goal)
print(result)
from judgeval.tracer import Tracer
judgment = Tracer(project_name="multi-agent-system")
from utils import judgment
from research_agent import ResearchAgent
from task_agent import TaskAgent
class PlanningAgent:
def __init__(self, id):
self.id = id
@judgment.agent() # Only on entry point
@judgment.observe()
def invoke_agent(self, goal):
print(f"Agent {self.id} is planning for goal: {goal}")
research_agent = ResearchAgent("Researcher1")
task_agent = TaskAgent("Tasker1")
research_results = research_agent.invoke_agent(goal)
task_result = task_agent.invoke_agent(research_results)
return f"Results from planning and executing for goal '{goal}': {task_result}"
@judgment.observe() # No @judgment.agent() needed
def random_tool(self):
pass
from utils import judgment
class ResearchAgent:
def __init__(self, id):
self.id = id
@judgment.agent()
@judgment.observe()
def invoke_agent(self, topic):
return f"Research notes for topic: {topic}: Findings and insights include..."
from utils import judgment
class TaskAgent:
def __init__(self, id):
self.id = id
@judgment.agent()
@judgment.observe()
def invoke_agent(self, task):
result = f"Performed task: {task}, here are the results: Results include..."
return result
The trace clearly shows which agent called which method:

Distributed Tracing
Distributed tracing allows you to track requests across multiple services and systems, providing end-to-end visibility into complex workflows. This is essential for understanding how your AI agents interact with external services and how data flows through your distributed architecture.
Sending Trace State
When your agent needs to propagate trace context to downstream services, you can manually extract and send trace context.
Dependencies:
uv add judgeval requests
from judgeval.tracer import Tracer
from opentelemetry.propagate import inject
import requests
judgment = Tracer(
project_name="distributed-system",
resource_attributes={"service.name": "agent-client"},
)
@judgment.observe(span_type="function")
def call_external_service(data):
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer ...",
}
inject(headers)
response = requests.post(
"http://localhost:8001/process",
json=data,
headers=headers
)
return response.json()
if __name__ == "__main__":
result = call_external_service({"query": "Hello from client"})
print(result)
Dependencies:
npm install judgeval @opentelemetry/api
import { context, propagation } from "@opentelemetry/api";
import { NodeTracer, TracerConfiguration } from "judgeval";
const config = TracerConfiguration.builder()
.projectName("distributed-system")
.resourceAttributes({ "service.name": "agent-client" })
.build();
const judgment = await NodeTracer.createWithConfiguration(config);
async function makeRequest(url: string, options: RequestInit = {}): Promise<any> {
const headers = {};
propagation.inject(context.active(), headers);
const response = await fetch(url, {
...options,
headers: { "Content-Type": "application/json", ...headers },
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
return response.json();
}
async function callExternalService(data: any) {
const callExternal = judgment.observe(async function callExternal(data: any) {
return await makeRequest("http://localhost:8001/process", {
method: "POST",
body: JSON.stringify(data),
});
}, "span");
return callExternal(data);
}
const result = await callExternalService({ message: "Hello!" });
console.log(result);
Receiving Trace State
When your service receives requests from other services, you can use middleware to automatically extract and set the trace context for all incoming requests.
Dependencies:
uv add judgeval fastapi uvicorn
from judgeval.tracer import Tracer
from opentelemetry.propagate import extract
from opentelemetry import context as otel_context
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from fastapi import FastAPI, Request
judgment = Tracer(
project_name="distributed-system",
resource_attributes={"service.name": "agent-server"},
)
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
@app.middleware("http")
async def trace_context_middleware(request: Request, call_next):
ctx = extract(dict(request.headers))
token = otel_context.attach(ctx)
try:
response = await call_next(request)
return response
finally:
otel_context.detach(token)
@judgment.observe(span_type="function")
def process_request(data):
return {"message": "Hello from Python server!", "received_data": data}
@app.post("/process")
async def handle_process(request: Request):
result = process_request(await request.json())
return result
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8001)
Dependencies:
npm install judgeval @opentelemetry/api express
import express from "express";
import { NodeTracer, TracerConfiguration } from "judgeval";
import { context, propagation } from "@opentelemetry/api";
const config = TracerConfiguration.builder()
.projectName("distributed-system")
.resourceAttributes({ "service.name": "agent-server" })
.build();
const judgment = await NodeTracer.createWithConfiguration(config);
const app = express();
app.use(express.json());
app.use((req, res, next) => {
const parentContext = propagation.extract(context.active(), req.headers);
context.with(parentContext, () => {
next();
});
});
async function processRequest(data: any) {
const process = judgment.observe(async function processRequest(data: any) {
return { message: "Hello from server!", received_data: data };
}, "span");
return process(data);
}
app.post("/process", async (req, res) => {
const result = await processRequest(req.body);
res.json(result);
});
app.listen(8001, () => console.log("Server running on port 8001"));

Toggling Monitoring
If your setup requires you to toggle monitoring intermittently, you can disable monitoring by:
- Setting the
JUDGMENT_MONITORING
environment variable tofalse
(Disables tracing)
export JUDGMENT_MONITORING=false
- Setting the
JUDGMENT_EVALUATIONS
environment variable tofalse
(Disables scoring on traces)
export JUDGMENT_EVALUATIONS=false