Tracer

Overview

The Tracer class provides comprehensive observability for AI agents and LLM applications. It automatically captures execution traces, spans, and performance metrics while enabling real-time evaluation and monitoring through the Judgment platform.

Quick Start Example

from judgeval.tracer import Tracer, wrap
from openai import OpenAI

# Initialize tracer
judgment = Tracer(
    project_name="default_project"
)
# Auto-trace LLM calls
client = wrap(OpenAI())

class QAAgent:
    def __init__(self, client):
        self.client = client
        
    @judgment.observe(span_type="tool")
    def process_query(self, query):
        response = self.client.chat.completions.create(
            model="gpt-5", 
            messages=[
                {"role": "system", "content": "You are a helpful assitant"}, 
                {"role": "user", "content": f"I have a query: {query}"}]
        )  # Automatically traced
        return f"Response: {response.choices[0].message.content}"
    
    # Basic function tracing
    @judgment.agent()
    @judgment.observe(span_type="agent")
    def invoke_agent(self, query):
        result = self.process_query(query)
        return result


if __name__ == "__main__":
    agent = QAAgent(client)
    print(agent.invoke_agent("What is the capital of the United States?"))

Tracer Initialization

The Tracer is your primary interface for adding observability to your AI agents. It provides methods for tracing function execution, evaluating performance, and collecting comprehensive environment interaction data.

`Tracer()`

Initialize a Tracer object.

Parameters

`api_key`

str

Optional

Recommended - set using the JUDGMENT_API_KEY environment variable

`organization_id`

str

Optional

Recommended - set using the JUDGMENT_ORG_ID environment variable

`project_name`

str

Default: "default_project"

Optional

Project name override

`enable_monitoring`

bool

Default: True

Optional

If you need to toggle monitoring on and off

`enable_evaluations`

bool

Default: True

Optional

If you need to toggle evaluations on and off for async_evaluate()

`resource_attributes`

Optional[Dict[str, Any]]

Default: None

Optional

OpenTelemetry resource attributes to attach to all spans. Resource attributes describe the entity producing the telemetry data (e.g., service name, version, environment). See the OpenTelemetry Resource specification for standard attributes.

Basic Example

tracer.py

from judgeval.tracer import Tracer

judgment = Tracer(
    project_name="default_project"
)

@judgment.observe(span_type="function")
def answer_question(question: str) -> str:
    answer = "The capital of the United States is Washington, D.C."
    return answer

@judgment.observe(span_type="tool")
def process_request(question: str) -> str:
    answer = answer_question(question)
    return answer

if __name__ == "__main__":
    print(process_request("What is the capital of the United States?"))

OpenTelemetry Integration

tracer_otel.py

from judgeval.tracer import Tracer
from opentelemetry.sdk.trace import TracerProvider

tracer_provider = TracerProvider()

# Initialize tracer with OpenTelemetry configuration
judgment = Tracer(
    project_name="default_project",
    resource_attributes={
        "service.name": "my-ai-agent",
        "service.version": "1.2.0",
        "deployment.environment": "production"
    }
)

tracer_provider.add_span_processor(judgment.get_processor())
tracer = tracer_provider.get_tracer(__name__)

def answer_question(question: str) -> str:
    with tracer.start_as_current_span("answer_question_span") as span:
        span.set_attribute("question", question)
        answer = "The capital of the United States is Washington, D.C."
        span.set_attribute("answer", answer)
    return answer

def process_request(question: str) -> str:
    with tracer.start_as_current_span("process_request_span") as span:
        span.set_attribute("input", question)
        answer = answer_question(question)
        span.set_attribute("output", answer)
    return answer

if __name__ == "__main__":
    print(process_request("What is the capital of the United States?"))

Agent Tracking and Online Evals

`@tracer.observe()`

Records an observation or output during a trace. This is useful for capturing intermediate steps, tool results, or decisions made by the agent. Optionally, provide a scorer config to run an evaluation on the trace.

Parameters

`func`

Callable

Required

The function to decorate (automatically provided when used as decorator)

`name`

str

Default: "None"

Optional

Optional custom name for the span (defaults to function name)

Example:

"custom_span_name"

`span_type`

str

Default: "span"

Optional

Type of span to create. Available options:

"span": General span (default)
"tool": For functions that should be tracked and exported as agent tools
"function": For main functions or entry points
"llm": For language model calls (automatically applied to wrapped clients)

LLM clients wrapped using wrap() automatically use the "llm" span type without needing manual decoration.

Example:

"tool"  # or "function", "llm", "span"

`scorer_config`

TraceScorerConfig

Default: None

Optional

Configuration for running an evaluation on the trace or sub-trace. When scorer_config is provided, a trace evaluation will be run for the sub-trace/span tree with the decorated function as the root. See TraceScorerConfig for more details

Example:

# retrieve/create a trace scorer to be used with the TraceScorerConfig
trace_scorer = TracePromptScorer.get(name="sample_trace_scorer")
TraceScorerConfig(
  scorer=trace_scorer,
  sampling_rate=0.5,
)

Example Code

trace.py

from openai import OpenAI
from judgeval.tracer import Tracer

client = OpenAI()
tracer = Tracer(project_name='default_project', deep_tracing=False)
    
@tracer.observe(span_type="tool")
def search_web(query):
  return f"Results for: {query}"

@tracer.observe(span_type="retriever")
def get_database(query):
  return f"Database results for: {query}"

@tracer.observe(span_type="function")
def run_agent(user_query):
  # Use tools based on query
  if "database" in user_query:
      info = get_database(user_query)
  else:
      info = search_web(user_query)
  
  prompt = f"Context: {info}, Question: {user_query}"
  
  # Generate response
  response = client.chat.completions.create(
      model="gpt-5",
      messages=[{"role": "user", "content": prompt}]
  )
  return response.choices[0].message.content

`wrap()`

Wraps an API client to add tracing capabilities. Supports OpenAI, Together, Anthropic, and Google GenAI clients. Patches methods like .create, Anthropic's .stream, and OpenAI's .responses.create and .beta.chat.completions.parse methods using a wrapper class.

Parameters

`client`

Any

Required

API client to wrap (OpenAI, Anthropic, Together, Google GenAI, Groq)

Example:

OpenAI()

Example Code

wrapped_api_client.py

from openai import OpenAI
from judgeval.tracer import wrap

client = OpenAI()
wrapped_client = wrap(client)

# All API calls are now automatically traced
response = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Streaming calls are also traced
stream = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

`tracer.async_evaluate()`

Runs quality evaluations on the current trace/span using specified scorers. You can provide either an Example object or individual evaluation parameters (input, actual_output, etc.).

Parameters

`scorer`

Union[APIScorerConfig, BaseScorer]

Required

A evaluation scorer to run. See Configuration Types for available scorer options.

Example:

FaithfulnessScorer()

`example`

Example

Required

Example object containing evaluation data. See Example for structure details.

`model`

str

Default: "None"

Optional

Model name for evaluation

Example:

"gpt-5"

`sampling_rate`

float

Default: 1

Optional

A float between 0 and 1 representing the chance the eval should be sampled

Example:

0.75 # Eval occurs 75% of the time

Example Code

async_evaluate.py

from judgeval.scorers import AnswerRelevancyScorer
from judgeval.data import Example
from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

@judgment.observe(span_type="function")
def agent(question: str) -> str:
    answer = "Paris is the capital of France"

    # Create example object
    example = Example(
        input=question,
        actual_output=answer,
    )

    # Evaluate using Example
    judgment.async_evaluate(
        scorer=AnswerRelevancyScorer(threshold=0.5),
        example=example,
        model="gpt-5",
        sampling_rate=0.9
    )

    return answer

if __name__ == "__main__":
    print(agent("What is the capital of France?"))

Multi-Agent Monitoring

`@tracer.agent()`

Method decorator for agentic systems that assigns an identifier to each agent and enables tracking of their internal state variables. Essential for monitoring and debugging single or multi-agent systems where you need to track each agent's behavior and state separately. This decorator should be used on the entry point method of your agent class.

Parameters

`identifier`

str

Optional

The identifier to associate with the class whose method is decorated. This will be used as the instance name in traces.

Example:

"id"

Example Code

agent.py

from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

class TravelAgent:
    def __init__(self, id):
        self.id = id
        
    @judgment.observe(span_type="tool")
    def book_flight(self, destination):
        return f"Flight booked to {destination}!"
    
    @judgment.agent(identifier="id")
    @judgment.observe(span_type="function")
    def invoke_agent(self, destination):
        flight_info = self.book_flight(destination)
        return f"Here is your requested flight info: {flight_info}"

if __name__ == "__main__":
    agent = TravelAgent("travel_agent_1")
    print(agent.invoke_agent("Paris"))
    
    agent2 = TravelAgent("travel_agent_2")
    print(agent2.invoke_agent("New York"))

`TraceScorerConfig()`

Initialize a TraceScorerConfig object for running an evaluation on the trace.

Parameters

`scorer`

TraceAPIScorerConfig

Required

The scorer to run on the trace

`model`

str

Default: gpt-4.1

Optional

Model name for evaluation

Example:

"gpt-4.1"

`sampling_rate`

float

Default: 1.0

Optional

A float between 0 and 1 representing the chance the eval should be sampled

Example:

0.75 # Eval occurs 75% of the time

`run_condition`

Callable[..., bool]

Default: None

Optional

A function that returns a boolean indicating whether the eval should be run. When TraceScorerConfig is used in @tracer.observe(), run_condition is called with the decorated function's arguments

Example:

lambda x: x > 10

For the above example, if this TraceScorerConfig instance is passed into a @tracer.observe() that decorates a function taking x as an argument, then the trace eval will only run if x > 10 when the decorated function is called

Example Code

trace_scorer_config.py

judgment = Tracer(project_name="default_project")
# Retrieve a trace scorer to be used with the TraceScorerConfig
trace_scorer = TracePromptScorer.get(name="sample_trace_scorer")
# A trace eval is only triggered if process_request() is called with x > 10
@judgment.observe(span_type="function", scorer_config=TraceScorerConfig(
  scorer=trace_scorer,
  sampling_rate=1.0,
  run_condition=lambda x: x > 10
))
def process_request(x):
  return x + 1

In the above example, a trace eval will be run for the trace/sub-trace with the process_request() function as the root.

Property	Type	Description
span_id	str	Unique identifier for this span
trace_id	str	ID of the parent trace
function	str	Name of the function being traced
span_type	str	Type of span ("span", "tool", "llm", "evaluation", "chain")
inputs	dict	Input parameters for this span
output	Any	Output/result of the span execution
duration	float	Execution time in seconds
depth	int	Nesting depth in the trace hierarchy
parent_span_id	str \| None	ID of the parent span (if nested)
agent_name	str \| None	Name of the agent executing this span
has_evaluation	bool	Whether this span has evaluation runs
evaluation_runs	List[EvaluationRun]	List of evaluations run on this span
usage	TraceUsage \| None	Token usage and cost information
error	Dict[str, Any] \| None	Error information if span failed
state_before	dict \| None	Agent state before execution
state_after	dict \| None	Agent state after execution

Example Usage

@tracer.observe(span_type="tool")
def debug_tool(query):
    span = tracer.get_current_span()
    if span:
        # Access span properties for debugging
        print(f"🔧 Executing {span.function} (ID: {span.span_id})")
        print(f"📊 Depth: {span.depth}, Type: {span.span_type}")
        print(f"📥 Inputs: {span.inputs}")
        
        # Check parent relationship
        if span.parent_span_id:
            print(f"👆 Parent span: {span.parent_span_id}")
            
        # Monitor execution state
        if span.agent_name:
            print(f"🤖 Agent: {span.agent_name}")
    
    result = perform_search(query)
    
    # Check span after execution
    if span:
        print(f"📤 Output: {span.output}")
        print(f"⏱️  Duration: {span.duration}s")
        if span.has_evaluation:
            print(f"✅ Has {len(span.evaluation_runs)} evaluations")
        if span.error:
            print(f"❌ Error: {span.error}")
    
    return result

Getting Started

from judgeval import Tracer

# Initialize tracer
tracer = Tracer(
    api_key="your_api_key",
    project_name="default_project"
)

# Basic function tracing
@tracer.observe(span_type="agent")
def my_agent(query):
    tracer.update_metadata({"user_query": query})
    result = process_query(query)
    tracer.log("Processing completed", label="info")
    return result

# Auto-trace LLM calls
from openai import OpenAI
from judgeval import wrap

client = wrap(OpenAI())
response = client.chat.completions.create(...)  # Automatically traced

Tracer

Parameters

Basic Example

OpenTelemetry Integration

Parameters

Example Code

Parameters

Example Code

Parameters

Example Code

Parameters

Example Code

Parameters

Example Code

On this page