Tracer

The Tracer class provides comprehensive observability for AI agents and LLM applications. It automatically captures execution traces, spans, and performance metrics while enabling real-time evaluation and monitoring through the Judgment platform.

from judgeval.tracer import Tracer, wrap
from openai import OpenAI

judgment = Tracer(
    project_name="default_project"
)
client = wrap(OpenAI())

class QAAgent:
    def __init__(self, client):
        self.client = client

    @judgment.observe(span_type="tool")
    def process_query(self, query):
        response = self.client.chat.completions.create(
            model="gpt-5",
            messages=[
                {"role": "system", "content": "You are a helpful assitant"},
                {"role": "user", "content": f"I have a query: {query}"}]
        )
        return f"Response: {response.choices[0].message.content}"

    @judgment.agent()
    @judgment.observe(span_type="agent")
    def invoke_agent(self, query):
        result = self.process_query(query)
        return result


if __name__ == "__main__":
    agent = QAAgent(client)
    print(agent.invoke_agent("What is the capital of the United States?"))

Tracer()

Initialize a Tracer object.

Tracer(
    api_key: str,
    organization_id: str,
    project_name: str,
    enable_monitoring: bool = True,
    enable_evaluations: bool = True,
    resource_attributes: Optional[Dict[str, Any]] = None
)

The Tracer is implemented as a singleton - only one instance exists per application. Multiple Tracer() initializations will return the same instance. All tracing is built on OpenTelemetry standards, ensuring compatibility with the broader observability ecosystem.

Parameters

api_key:str

Recommended: set with JUDGMENT_API_KEY environment variable.

organization_id:str

Recommended: set with JUDGMENT_ORG_ID environment variable.

project_name:str

Project name override

Default: "default_project"

enable_monitoring:bool

Toggle monitoring

Default: True

enable_evaluations:bool

Toggle evaluations for async_evaluate()

Default: True

resource_attributes:Optional[Dict[str, Any]]

OpenTelemetry resource attributes to attach to all spans. Resource attributes describe the entity producing the telemetry data (e.g., service name, version, environment). See the OpenTelemetry Resource specification for standard attributes.

Default: None

Example

tracer.py

from judgeval.tracer import Tracer

judgment = Tracer(
    project_name="default_project"
)

@tracer.observe()

Decorate a function to record an observation or output during a trace. This is useful for capturing intermediate steps, tool results, or decisions made by the agent.

Optionally, provide a scorer config to run an evaluation on the trace.

Parameters

funcrequired:Callable

The function to decorate (automatically provided when used as decorator)

name:str

Optional custom name for the span (defaults to function name)

Default: None

span_type:str

Type of span to create. Available options:

"span": General span (default)
"tool": For functions that should be tracked and exported as agent tools
"function": For main functions or entry points
"llm": For language model calls (automatically applied to wrapped clients)

LLM clients wrapped using wrap() automatically use the "llm" span type without needing manual decoration.

Default: "span"

scorer_config:TraceScorerConfig

Configuration for running an evaluation on the trace or sub-trace. When scorer_config is provided, a trace evaluation will be run for the sub-trace/span tree with the decorated function as the root. See TraceScorerConfig for more details

Default: None

Examples

from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

@judgment.observe(span_type="function") 
def answer_question(question: str) -> str:
  answer = "The capital of the United States is Washington, D.C."
  return answer

@judgment.observe(span_type="tool") 
def process_request(question: str) -> str:
  answer = answer_question(question)
  return answer

if __name__ == "__main__":
  print(process_request("What is the capital of the United States?"))

from openai import OpenAI
from judgeval.tracer import Tracer

client = OpenAI()
tracer = Tracer(project_name='default_project')

@tracer.observe(span_type="tool") 
def search_web(query):
  return f"Results for: {query}"

@tracer.observe(span_type="retriever") 
def get_database(query):
  return f"Database results for: {query}"

@tracer.observe(span_type="function") 
def run_agent(user_query):
  # Use tools based on query
  if "database" in user_query:
    info = get_database(user_query)
  else:
    info = search_web(user_query)

prompt = f"Context: {info}, Question: {user_query}"

# Generate response
response = client.chat.completions.create(
  model="gpt-5",
  messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content

wrap()

Wraps an API client to add tracing capabilities. Supports OpenAI, Together, Anthropic, and Google GenAI clients. Patches methods like .create, Anthropic's .stream, and OpenAI's .responses.create and .beta.chat.completions.parse methods using a wrapper class.

Parameters

clientrequired:Any

API client to wrap (OpenAI, Anthropic, Together, Google GenAI, Groq)

Example

wrapped_api_client.py

from openai import OpenAI
from judgeval.tracer import wrap

client = OpenAI()
wrapped_client = wrap(client)

# All API calls are now automatically traced
response = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Streaming calls are also traced
stream = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

tracer.async_evaluate()

Runs quality evaluations on the current trace/span using specified scorers. You can provide either an Example object or individual evaluation parameters (input, actual_output, etc.).

Parameters

scorerrequired:Union[APIScorerConfig, BaseScorer]

A evaluation scorer to run. See Configuration Types for available scorer options.

Example: FaithfulnessScorer()

examplerequired:Example

Example object containing evaluation data. See Example for structure details.

model:str

Model name for evaluation

Default: None

Example: "gpt-5"

sampling_rate:float

A float between 0 and 1 representing the chance the eval should be sampled

Default: 1.0

Example

async_evaluate.py

from judgeval.scorers import AnswerRelevancyScorer
from judgeval.data import Example
from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

@judgment.observe(span_type="function")
def agent(question: str) -> str:
    answer = "Paris is the capital of France"

    # Create example object
    example = Example(
        input=question,
        actual_output=answer,
    )

    # Evaluate using Example
    judgment.async_evaluate(
        scorer=AnswerRelevancyScorer(threshold=0.5),
        example=example,
        model="gpt-5",
        sampling_rate=0.9
    )

    return answer

if __name__ == "__main__":
    print(agent("What is the capital of France?"))

@tracer.agent()

Method decorator for agentic systems that assigns an identifier to each agent and enables tracking of their internal state variables. Essential for monitoring and debugging single or multi-agent systems where you need to track each agent's behavior and state separately. This decorator should be used on the entry point method of your agent class.

Parameters

identifier:str

The identifier to associate with the class whose method is decorated. This will be used as the instance name in traces.

Examples

Basic Usage

agent.py

from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

class TravelAgent:
    def __init__(self, id):
        self.id = id

    @judgment.observe(span_type="tool")
    def book_flight(self, destination):
        return f"Flight booked to {destination}!"

    @judgment.agent(identifier="id")
    @judgment.observe(span_type="function")
    def invoke_agent(self, destination):
        flight_info = self.book_flight(destination)
        return f"Here is your requested flight info: {flight_info}"

if __name__ == "__main__":
    agent = TravelAgent("travel_agent_1")
    print(agent.invoke_agent("Paris"))

    agent2 = TravelAgent("travel_agent_2")
    print(agent2.invoke_agent("New York"))

Multi-Agent System Tracing

main.py

from planning_agent import PlanningAgent

if __name__ == "__main__":
    planning_agent = PlanningAgent("planner-1")
    goal = "Build a multi-agent system"
    result = planning_agent.plan(goal)
    print(result)

utils.py

from judgeval.tracer import Tracer

judgment = Tracer(project_name="multi-agent-system")

planning_agent.py

from utils import judgment
from research_agent import ResearchAgent
from task_agent import TaskAgent

class PlanningAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent() # Only add @judgment.agent() to the entry point function of the agent
    @judgment.observe()
    def invoke_agent(self, goal):
        print(f"Agent {self.id} is planning for goal: {goal}")

        research_agent = ResearchAgent("Researcher1")
        task_agent = TaskAgent("Tasker1")

        research_results = research_agent.invoke_agent(goal)
        task_result = task_agent.invoke_agent(research_results)

        return f"Results from planning and executing for goal '{goal}': {task_result}"

    @judgment.observe() # No need to add @judgment.agent() here
    def random_tool(self):
        pass

research_agent.py

from utils import judgment

class ResearchAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent()
    @judgment.observe()
    def invoke_agent(self, topic):
        return f"Research notes for topic: {topic}: Findings and insights include..."

task_agent.py

from utils import judgment

class TaskAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent()
    @judgment.observe()
    def invoke_agent(self, task):
        result = f"Performed task: {task}, here are the results: Results include..."
        return result

The trace will show up in the Judgment platform clearly indicating which agent called which method:

Each agent's tool calls are clearly associated with their respective classes, making it easy to follow the execution flow across your multi-agent system.

TraceScorerConfig()

Initialize a TraceScorerConfig object for running an evaluation on the trace.

Parameters

scorerrequired:TraceAPIScorerConfig

The scorer to run on the trace

model:str

Model name for evaluation

Default: "gpt-4.1"

sampling_rate:float

A float between 0 and 1 representing the chance the eval should be sampled

Default: 1.0

run_condition:Callable[..., bool]

A function that returns a boolean indicating whether the eval should be run. When TraceScorerConfig is used in @tracer.observe(), run_condition is called with the decorated function's arguments. For example, lambda x: x > 10 will only run when x > 10.

Default: None

Example

A trace eval will be run for the trace/sub-trace with the process_request() function as the root

trace_scorer_config.py

judgment = Tracer(project_name="default_project")

# Retrieve a trace scorer to be used with the TraceScorerConfig
trace_scorer = TracePromptScorer.get(name="sample_trace_scorer")

# A trace eval is only triggered if process_request() is called with x > 10
@judgment.observe(
  span_type="function",
  scorer_config=TraceScorerConfig(
    scorer=trace_scorer,
    sampling_rate=1.0,
    run_condition=lambda x: x > 10
  )
)
def process_request(x):
  return x + 1

tracer.get_current_span()

Returns the current span object for direct access to span properties and methods, useful for debugging and inspection.

Attributes

The current span object provides these properties for inspection and debugging:

span_id:str

Unique identifier for this span

trace_id:str

ID of the parent trace

function:str

Name of the function being traced

span_type:str

Type of span ("span", "tool", "llm", "evaluation", "chain")

inputs:dict

Input parameters for this span

output:Any

Output/result of the span execution

duration:float

Execution time in seconds

depth:int

Nesting depth in the trace hierarchy

parent_span_id:str | None

ID of the parent span (if nested)

agent_name:str | None

Name of the agent executing this span

has_evaluation:bool

Whether this span has evaluation runs

evaluation_runs:List[EvaluationRun]

List of evaluations run on this span

usage:TraceUsage | None

Token usage and cost information

error:Dict[str, Any] | None

Error information if span failed

state_before:dict | None

Agent state before execution

state_after:dict | None

Agent state after execution

Example Usage

@tracer.observe(span_type="tool")
def debug_tool(query):
    span = tracer.get_current_span()
    if span:
        # Access span properties for debugging
        print(f"Executing {span.function} (ID: {span.span_id})")
        print(f"Depth: {span.depth}, Type: {span.span_type}")
        print(f"Inputs: {span.inputs}")

        # Check parent relationship
        if span.parent_span_id:
            print(f"Parent span: {span.parent_span_id}")

        # Monitor execution state
        if span.agent_name:
            print(f"Agent: {span.agent_name}")

    result = perform_search(query)

    # Check span after execution
    if span:
        print(f"Output: {span.output}")
        print(f"Duration: {span.duration}s")
        if span.has_evaluation:
            print(f"Has {len(span.evaluation_runs)} evaluations")
        if span.error:
            print(f"Error: {span.error}")

    return result

Advanced Tracing

tracer.set_customer_id

You can specify a span and its children spans to be associated with a customer_id, which enables further insights through our usage dashboards. Note that in order for the trace to be associated with a customer_id, you need to set this on the root_span of the trace for it to show up on the monitoring tables.

tracer.set_tag

Coming soon.

OpenTelemetry TracerProvider

tracer_otel.py

from judgeval.tracer import Tracer
from opentelemetry.sdk.trace import TracerProvider

tracer_provider = TracerProvider()

# Initialize tracer with OpenTelemetry configuration
judgment = Tracer(
    project_name="default_project",
    resource_attributes={
        "service.name": "my-ai-agent",
        "service.version": "1.2.0",
        "deployment.environment": "production"
    }
)

tracer_provider.add_span_processor(judgment.get_processor())
tracer = tracer_provider.get_tracer(__name__)

def answer_question(question: str) -> str:
    with tracer.start_as_current_span("answer_question_span") as span:
        span.set_attribute("question", question)
        answer = "The capital of the United States is Washington, D.C."
        span.set_attribute("answer", answer)
    return answer

def process_request(question: str) -> str:
    with tracer.start_as_current_span("process_request_span") as span:
        span.set_attribute("input", question)
        answer = answer_question(question)
        span.set_attribute("output", answer)
    return answer

if __name__ == "__main__":
    print(process_request("What is the capital of the United States?"))

Tracer

On this page