Tracing

Track agent behavior and evaluate performance in real-time with OpenTelemetry-based tracing.

Tracing provides comprehensive observability for your AI agents, automatically capturing execution traces, spans, and performance metrics. All tracing is built on OpenTelemetry standards, so you can monitor agent behavior regardless of implementation language.

Tracing

Quickstart

Initialize the Tracer

tracer.py
from judgeval import Tracer

Tracer.init(project_name="default_project")
tracer.ts
import { Tracer } from "judgeval";

await Tracer.init({
    projectName: "default_project",
});

Trace your Agent

Tracing captures your agent's inputs, outputs, tool calls, and LLM calls to help you debug and analyze agent behavior.

Note: This example uses OpenAI. Make sure you have OPENAI_API_KEY set in your environment variables before running.

To properly trace your agent, use @Tracer.observe() decorator on all functions and tools of your agent, including LLM calls.

trace_agent.py
from openai import OpenAI
from judgeval import Tracer
import time

Tracer.init(project_name="default_project")
openai = OpenAI()

@Tracer.observe(span_type="tool") 
def format_task(question: str) -> str:
    time.sleep(0.5)
    return f"Please answer the following question: {question}"

@Tracer.observe(span_type="llm") 
def openai_completion(prompt: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

@Tracer.observe(span_type="tool") 
def answer_question(prompt: str) -> str:
    time.sleep(0.3)
    return openai_completion(prompt)

@Tracer.observe(span_type="function") 
def run_agent(question: str) -> str:
    task = format_task(question)
    answer = answer_question(task)
    return answer

if __name__ == "__main__":
    result = run_agent("What is the capital of the United States?")
    print(result)

To properly trace your agent, use Tracer.observe(...) to wrap all functions and tools of your agent, including LLM calls.

traceAgent.ts
import { Tracer } from "judgeval";
import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
});

await Tracer.init({
    projectName: "default_project",
});

const runAgent = Tracer.observe(async function runAgent( 
    question: string
): Promise<string> {
    const task = await formatTask(question);
    const answer = await answerQuestion(task);
    return answer;
},
"function");

const formatTask = Tracer.observe(async function formatTask( 
    question: string
): Promise<string> {
    await new Promise((resolve) => setTimeout(resolve, 500));
    return `Please answer the following question: ${question}`;
},
"tool");

const answerQuestion = Tracer.observe(async function answerQuestion( 
    prompt: string
): Promise<string> {
    await new Promise((resolve) => setTimeout(resolve, 300));
    return await openAICompletion(prompt);
},
"tool");

const openAICompletion = Tracer.observe(async function openAICompletion( 
    prompt: string
): Promise<string> {
    const response = await openai.chat.completions.create({
        model: "gpt-5.2",
        messages: [{ role: "user", content: prompt }],
    });
    return response.choices[0]?.message.content || "No answer";
},
"llm");

await runAgent("What is the capital of the United States?");
await Tracer.shutdown();

Agents deployed in server environments (Express, Next.js, etc.) don't require await Tracer.shutdown() as the tracer handles spans automatically throughout the application lifecycle.

Congratulations! You've just created your first trace. It should look like this:

Image of a basic trace

You can also use auto-instrumentation to trace LLM calls without manually using @Tracer.observe().


What Gets Captured

The Tracer automatically captures comprehensive execution data:

  • Execution Flow: Function call hierarchy, execution duration, and parent-child span relationships
  • LLM Interactions: Model parameters, prompts, responses, token usage, and cost per API call
  • Agent Behavior: Tool usage, function inputs/outputs, state changes, and error states
  • Performance Metrics: Latency per span, total execution time, and cost tracking

Grouping Traces into Sessions

Sessions allow you to group related traces together, providing a conversation-level view of user interactions with your agent. By associating traces with a session ID, you can analyze entire conversations, track behavior patterns across multiple requests, and understand how your agent performs over extended interactions.

Setting Session IDs

Use set_session_id() to associate traces with a session. All child spans within that trace will automatically inherit the session ID.

Learn more about set_session_id() in the Tracer SDK Reference.

session_example.py
from judgeval import Tracer, wrap
from openai import OpenAI
import uuid

Tracer.init(project_name="default_project")
openai = wrap(OpenAI())

session_id = str(uuid.uuid4())

@Tracer.observe(span_type="function")
def chat_turn(user_message: str) -> str:
    Tracer.set_session_id(session_id)
    response = openai.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": user_message}]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(chat_turn("Hello! What's the weather like?"))
    print(chat_turn("Can you recommend a restaurant nearby?"))
    print(chat_turn("Thanks for your help!"))
sessionExample.ts
import { Tracer } from "judgeval";
import OpenAI from "openai";
import { randomUUID } from "crypto";

await Tracer.init({
    projectName: "default_project",
});

const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
});

// Generate a unique session ID for this conversation
const sessionId = randomUUID();

const chatTurn = Tracer.observe(async function chatTurn(
    userMessage: string
): Promise<string> {
    Tracer.setSessionId(sessionId);  // Associate this trace with the session
    const response = await openai.chat.completions.create({
        model: "gpt-5.2",
        messages: [{ role: "user", content: userMessage }],
    });
    return response.choices[0]?.message.content || "No response";
}, "function");

// Multiple traces, all associated with the same session
await chatTurn("Hello! What's the weather like?");
await chatTurn("Can you recommend a restaurant nearby?");
await chatTurn("Thanks for your help!");

await Tracer.shutdown();

Note: Set the session ID on the root span of each trace for it to appear correctly in the Sessions table.

What Sessions Capture

Sessions aggregate data from all associated traces to provide comprehensive insights:

MetricDescription
Session IDThe unique identifier for the session
Created AtThe earliest trace start time in the session
DurationTime between the earliest trace start time and the latest trace end time in the session
LLM CostTotal LLM cost summed across all traces in the session
Trace CountNumber of traces associated with the session
BehaviorsAggregated behaviors — if any trace in the session exhibits a behavior, the entire session is marked with that behavior

Viewing Sessions

Navigate to the Sessions tab in Monitoring to view all sessions:

Sessions table view

Click on any session to see all traces within that session:

Session detail view showing traces

Project Routing

Multi-Project Support allows you to route traces to different projects at runtime. Initialize multiple tracers for different projects, then use set_active() to switch which project receives traces.

project_routing.py
import os
from judgeval import Tracer, wrap
from openai import OpenAI

env = os.getenv("ENVIRONMENT", "staging")  # "staging" or "production"

tracer_staging = Tracer.init(project_name="staging - my_agent", set_active=(env == "staging"))
tracer_prod = Tracer.init(project_name="production - my_agent", set_active=(env == "production"))

openai = wrap(OpenAI())

@Tracer.observe(span_type="function")
def handle_request(query: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": query}]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(handle_request("What is the capital of France?"))

Note: Only one tracer can be active at a time. Call tracer.set_active() to switch which project receives traces. Switching is not allowed while a root span is active.


Manual Attribute Setting

You can manually set attributes on spans to add custom metadata or explicitly capture input/output data. This is useful when you want to add additional context that isn't automatically captured.

Setting Input and Output

Use Tracer.set_input() and Tracer.set_output() to explicitly set input and output data on the current span:

manual_input_output.py
from judgeval import Tracer
from openai import OpenAI

Tracer.init(project_name="default_project")
openai = OpenAI()

@Tracer.observe(span_type="function")
def process_query(user_query: str) -> str:
    Tracer.set_input(user_query)  

    response = openai.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": user_query}]
    )

    result = response.choices[0].message.content

    Tracer.set_output(result)  

    return result

result = process_query("What is the capital of France?")
print(result)

Setting Custom Attributes

Use Tracer.set_attribute() to add custom metadata to spans:

custom_attributes.py
from judgeval import Tracer
from openai import OpenAI

Tracer.init(project_name="default_project")
openai = OpenAI()

@Tracer.observe(span_type="function")
def analyze_sentiment(text: str, user_id: str) -> str:
    Tracer.set_attribute("user_id", user_id)  
    Tracer.set_attribute("text_length", len(text))  
    Tracer.set_attribute("analysis_type", "sentiment")  

    response = openai.chat.completions.create(
        model="gpt-5.2",
        messages=[
            {"role": "system", "content": "Analyze the sentiment of the text."},
            {"role": "user", "content": text}
        ]
    )

    result = response.choices[0].message.content
    Tracer.set_output(result)

    return result

result = analyze_sentiment("I love this product!", "user_123")
print(result)

Setting Multiple Attributes

Use Tracer.set_attributes() to set multiple attributes at once:

multiple_attributes.py
from judgeval import Tracer

Tracer.init(project_name="default_project")

@Tracer.observe(span_type="function")
def process_order(order_id: str, customer_id: str, total: float):
    Tracer.set_attributes({  
        "order_id": order_id,  
        "customer_id": customer_id,  
        "order_total": total,  
        "currency": "USD",  
        "payment_method": "credit_card"
    })  

    return {"status": "processed", "order_id": order_id}

result = process_order("order_123", "customer_456", 99.99)

Auto-Instrumentation

Auto-instrumentation automatically traces LLM client calls without manually wrapping each call with observe(). This reduces boilerplate code and ensures all LLM interactions are captured.

Python supports auto-instrumentation through the wrap() function. It automatically tracks all LLM API calls including token usage, costs, and streaming responses for both sync and async clients.

Refer to Model Providers for more information on supported providers.

auto_instrument.py
from judgeval import Tracer, wrap
from openai import OpenAI

Tracer.init(project_name="default_project")

openai = wrap(OpenAI())

@Tracer.observe(span_type="function")
def ask_question(question: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-5.2",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = ask_question("What is the capital of France?")
print(result)

To correctly implement auto-instrumentation on LLM calls, you need to do all of the following:

  1. Initialize an instrumentation file to be preloaded before the application starts.
  2. Register OpenTelemetry instrumentations on Tracer before calling Tracer.init().
  3. Bundle your application using CommonJS.
import { OpenAIInstrumentation } from "@opentelemetry/instrumentation-openai";
import { Tracer } from "judgeval";

Tracer.registerOTELInstrumentation(new OpenAIInstrumentation()); 

await Tracer.init({
    projectName: "auto_instrumentation_example",
});
import "./instrumentation";
import { Tracer } from "judgeval";
import OpenAI from "openai";

function requireEnv(name: string): string {
    const value = process.env[name];
    if (!value) {
        throw new Error(`Environment variable ${name} is not set`);
    }
    return value;
}

const OPENAI_API_KEY = requireEnv("OPENAI_API_KEY");

const openai = new OpenAI({
    apiKey: OPENAI_API_KEY,
});

async function _chatWithUser(userMessage: string): Promise<string> {
    const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: userMessage },
    ];

    const completion = await openai.chat.completions.create({
        model: "gpt-5.2",
        messages,
    });

    const result = completion.choices[0].message.content || "";

    console.log(`User: ${userMessage}`);
    console.log(`Assistant: ${result}`);

    Tracer.asyncEvaluate("answer_relevancy", { 
        input: "What is the capital of France?",
        actual_output: result,
    });

    return result;
}

(async () => {
    const chatWithUser = Tracer.observe(_chatWithUser); 

    const result = await chatWithUser("What is the capital of France?");
    console.log(result);

    await new Promise((resolve) => setTimeout(resolve, 10000));
    await Tracer.shutdown();
})();

OpenTelemetry Integration

Judgment's tracing is built on OpenTelemetry, the industry-standard observability framework. This means:

Standards Compliance

  • Compatible with existing OpenTelemetry tooling
  • Follows OTEL semantic conventions
  • Integrates with OTEL collectors and exporters

Advanced Configuration

You can integrate Judgment's tracer with your existing OpenTelemetry setup:

otel_integration.py
from judgeval import Tracer
from opentelemetry.sdk.trace import TracerProvider

tracer_provider = TracerProvider()

tracer = Tracer.init(project_name="default_project")

tracer_provider.add_span_processor(tracer.get_span_processor()) 
otel_tracer = tracer_provider.get_tracer(__name__)

def process_request(question: str) -> str:
    with otel_tracer.start_as_current_span("process_request_span") as span:
        span.set_attribute("input", question)
        answer = answer_question(question)
        span.set_attribute("output", answer)
    return answer

Resource Attributes

Resource attributes describe the entity producing telemetry data. Common attributes include:

  • service.name - Name of your service
  • service.version - Version number
  • deployment.environment - Environment (production, staging, etc.)
  • service.namespace - Logical grouping

See the OpenTelemetry Resource specification for standard attributes.

Distributed Tracing

Distributed tracing allows you to track requests across multiple services and systems, providing end-to-end visibility into complex workflows. This is essential for understanding how your AI agents interact with external services and how data flows through your distributed architecture.

Important Configuration Notes:

  • Project Name: Use the same project_name across all services so traces appear in the same project in the Judgment platform
  • Service Name: Set distinct service.name in resource attributes to differentiate between services in your distributed system

Sending Trace State

When your agent needs to propagate trace context to downstream services, you can manually extract and send trace context.

uv add judgeval requests
pip install judgeval requests
agent.py
from judgeval import Tracer, propagation
import requests

Tracer.init(
    project_name="distributed-system",
    resource_attributes={"service.name": "agent-client"},
)

@Tracer.observe(span_type="function") 
def call_external_service(data):
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer ...",
    }
    propagation.inject(headers)

    response = requests.post(
        "http://localhost:8001/process",
        json=data,
        headers=headers
    )

    return response.json()

if __name__ == "__main__":
    result = call_external_service({"query": "Hello from client"})
    print(result)
npm install judgeval @opentelemetry/api
yarn add judgeval @opentelemetry/api
pnpm add judgeval @opentelemetry/api
bun add judgeval @opentelemetry/api
agent.ts
import { Tracer, propagation } from "judgeval";

await Tracer.init({
    projectName: "distributed-system",
    resourceAttributes: { "service.name": "agent-client" },
});

async function makeRequest(url: string, options: RequestInit = {}): Promise<any> {
    const headers: Record<string, string> = {};
    propagation.inject(headers); 

    const response = await fetch(url, {
        ...options,
        headers: { "Content-Type": "application/json", ...headers },
    });

    if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
    }

    return response.json();
}

const callExternalService = Tracer.observe(async function (data: any) {
    return await makeRequest("http://localhost:8001/process", {
        method: "POST",
        body: JSON.stringify(data),
    });
}, "span");

const result = await callExternalService({ message: "Hello!" });
console.log(result);
await Tracer.shutdown();

Receiving Trace State

When your service receives requests from other services, you can use middleware to automatically extract and set the trace context for all incoming requests.

uv add judgeval fastapi uvicorn
pip install judgeval fastapi uvicorn
service.py
from judgeval import Tracer, propagation
from opentelemetry import context as otel_context
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from fastapi import FastAPI, Request

Tracer.init(
    project_name="distributed-system",
    resource_attributes={"service.name": "agent-server"},
)

app = FastAPI()

FastAPIInstrumentor.instrument_app(app)

@app.middleware("http")
async def trace_context_middleware(request: Request, call_next):
    ctx = propagation.extract(dict(request.headers))
    token = otel_context.attach(ctx)
    try:
        response = await call_next(request)
        return response
    finally:
        otel_context.detach(token)

@Tracer.observe(span_type="function") 
def process_request(data):
    return {"message": "Hello from Python server!", "received_data": data}

@app.post("/process")
async def handle_process(request: Request):
    result = process_request(await request.json())
    return result

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8001)
npm install judgeval @opentelemetry/api express
yarn add judgeval @opentelemetry/api express
pnpm add judgeval @opentelemetry/api express
bun add judgeval @opentelemetry/api express
service.ts
import express from "express";
import { Tracer } from "judgeval";

await Tracer.init({
    projectName: "distributed-system",
    resourceAttributes: { "service.name": "agent-server" },
});

const app = express();
app.use(express.json());

const processRequest = Tracer.observe(async function (data: any) {
    return { message: "Hello from server!", received_data: data };
}, "span");

app.post("/process", async (req, res) => {
    await Tracer.continueTrace(req.headers, async () => {
        const result = await processRequest(req.body);
        res.json(result);
    });
});

app.listen(8001, () => console.log("Server running on port 8001"));

Testing Distributed Tracing:

  1. Start the server (Python FastAPI or TypeScript Express) on port 8001
  2. Run the client (Python or TypeScript) to send requests to the server
  3. View traces in the Judgment platform to see the distributed trace flow

The client examples will automatically send trace context to the server, creating a complete distributed trace across both services.

Distributed tracing trace tree visualization

Toggling Monitoring

If your setup requires you to toggle monitoring intermittently, you can disable monitoring by:

  • Setting the JUDGMENT_MONITORING environment variable to false (Disables tracing)
export JUDGMENT_MONITORING=false
  • Setting the JUDGMENT_EVALUATIONS environment variable to false (Disables scoring on traces)
export JUDGMENT_EVALUATIONS=false

Next Steps

  • Tracer SDK Reference - Explore the complete Tracer API including span access, metadata, and advanced configuration