Judgeval Python SDK

Tracer

Track agent behavior and evaluate performance in real-time with the Tracer class.

The Tracer class provides comprehensive observability for AI agents and LLM applications. It automatically captures execution traces, spans, and performance metrics while enabling real-time evaluation and monitoring through the Judgment platform.

from judgeval.tracer import Tracer, wrap
from openai import OpenAI

judgment = Tracer(
    project_name="default_project"
)
client = wrap(OpenAI())

class QAAgent:
    def __init__(self, client):
        self.client = client

    @judgment.observe(span_type="tool")
    def process_query(self, query):
        response = self.client.chat.completions.create(
            model="gpt-5",
            messages=[
                {"role": "system", "content": "You are a helpful assitant"},
                {"role": "user", "content": f"I have a query: {query}"}]
        )
        return f"Response: {response.choices[0].message.content}"

    @judgment.agent()
    @judgment.observe(span_type="agent")
    def invoke_agent(self, query):
        result = self.process_query(query)
        return result


if __name__ == "__main__":
    agent = QAAgent(client)
    print(agent.invoke_agent("What is the capital of the United States?"))

Tracer()

Initialize a Tracer object.

Tracer(
    api_key: str,
    organization_id: str,
    project_name: str,
    enable_monitoring: bool = True,
    enable_evaluations: bool = True,
    resource_attributes: Optional[Dict[str, Any]] = None
)

The Tracer is implemented as a singleton - only one instance exists per application. Multiple Tracer() initializations will return the same instance. All tracing is built on OpenTelemetry standards, ensuring compatibility with the broader observability ecosystem.

Parameters

api_key:str

Recommended: set with JUDGMENT_API_KEY environment variable.

organization_id:str

Recommended: set with JUDGMENT_ORG_ID environment variable.

project_name:str
Project name override
Default: "default_project"
enable_monitoring:bool
Toggle monitoring
Default: True
enable_evaluations:bool
Toggle evaluations for async_evaluate()
Default: True
resource_attributes:Optional[Dict[str, Any]]

OpenTelemetry resource attributes to attach to all spans. Resource attributes describe the entity producing the telemetry data (e.g., service name, version, environment). See the OpenTelemetry Resource specification for standard attributes.

Default: None

Example

tracer.py
from judgeval.tracer import Tracer

judgment = Tracer(
    project_name="default_project"
)

@tracer.observe()

Decorate a function to record an observation or output during a trace. This is useful for capturing intermediate steps, tool results, or decisions made by the agent.

Optionally, provide a scorer config to run an evaluation on the trace.

Parameters

funcrequired:Callable

The function to decorate (automatically provided when used as decorator)

name:str

Optional custom name for the span (defaults to function name)

Default: None
span_type:str

Type of span to create. Available options:

  • "span": General span (default)
  • "tool": For functions that should be tracked and exported as agent tools
  • "function": For main functions or entry points
  • "llm": For language model calls (automatically applied to wrapped clients)

LLM clients wrapped using wrap() automatically use the "llm" span type without needing manual decoration.

Default: "span"
scorer_config:TraceScorerConfig

Configuration for running an evaluation on the trace or sub-trace. When scorer_config is provided, a trace evaluation will be run for the sub-trace/span tree with the decorated function as the root. See TraceScorerConfig for more details

Default: None

Examples

from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

@judgment.observe(span_type="function") 
def answer_question(question: str) -> str:
  answer = "The capital of the United States is Washington, D.C."
  return answer

@judgment.observe(span_type="tool") 
def process_request(question: str) -> str:
  answer = answer_question(question)
  return answer

if __name__ == "__main__":
  print(process_request("What is the capital of the United States?"))
from openai import OpenAI
from judgeval.tracer import Tracer

client = OpenAI()
tracer = Tracer(project_name='default_project')

@tracer.observe(span_type="tool") 
def search_web(query):
  return f"Results for: {query}"

@tracer.observe(span_type="retriever") 
def get_database(query):
  return f"Database results for: {query}"

@tracer.observe(span_type="function") 
def run_agent(user_query):
  # Use tools based on query
  if "database" in user_query:
    info = get_database(user_query)
  else:
    info = search_web(user_query)

prompt = f"Context: {info}, Question: {user_query}"

# Generate response
response = client.chat.completions.create(
  model="gpt-5",
  messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content

wrap()

Wraps an API client to add tracing capabilities. Supports OpenAI, Together, Anthropic, and Google GenAI clients. Patches methods like .create, Anthropic's .stream, and OpenAI's .responses.create and .beta.chat.completions.parse methods using a wrapper class.

Parameters

clientrequired:Any

API client to wrap (OpenAI, Anthropic, Together, Google GenAI, Groq)

Example

wrapped_api_client.py
from openai import OpenAI
from judgeval.tracer import wrap

client = OpenAI()
wrapped_client = wrap(client)

# All API calls are now automatically traced
response = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Streaming calls are also traced
stream = wrapped_client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

tracer.async_evaluate()

Runs quality evaluations on the current trace/span using specified scorers. You can provide either an Example object or individual evaluation parameters (input, actual_output, etc.).

Parameters

scorerrequired:Union[APIScorerConfig, BaseScorer]

A evaluation scorer to run. See Configuration Types for available scorer options.

Example: FaithfulnessScorer()
examplerequired:Example

Example object containing evaluation data. See Example for structure details.

model:str
Model name for evaluation
Default: None
Example: "gpt-5"
sampling_rate:float

A float between 0 and 1 representing the chance the eval should be sampled

Default: 1.0

Example

async_evaluate.py
from judgeval.scorers import AnswerRelevancyScorer
from judgeval.data import Example
from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

@judgment.observe(span_type="function")
def agent(question: str) -> str:
    answer = "Paris is the capital of France"

    # Create example object
    example = Example(
        input=question,
        actual_output=answer,
    )

    # Evaluate using Example
    judgment.async_evaluate(
        scorer=AnswerRelevancyScorer(threshold=0.5),
        example=example,
        model="gpt-5",
        sampling_rate=0.9
    )

    return answer

if __name__ == "__main__":
    print(agent("What is the capital of France?"))

@tracer.agent()

Method decorator for agentic systems that assigns an identifier to each agent and enables tracking of their internal state variables. Essential for monitoring and debugging single or multi-agent systems where you need to track each agent's behavior and state separately. This decorator should be used on the entry point method of your agent class.

Parameters

identifier:str

The identifier to associate with the class whose method is decorated. This will be used as the instance name in traces.

Examples

Basic Usage

agent.py
from judgeval.tracer import Tracer

judgment = Tracer(project_name="default_project")

class TravelAgent:
    def __init__(self, id):
        self.id = id

    @judgment.observe(span_type="tool")
    def book_flight(self, destination):
        return f"Flight booked to {destination}!"

    @judgment.agent(identifier="id")
    @judgment.observe(span_type="function")
    def invoke_agent(self, destination):
        flight_info = self.book_flight(destination)
        return f"Here is your requested flight info: {flight_info}"

if __name__ == "__main__":
    agent = TravelAgent("travel_agent_1")
    print(agent.invoke_agent("Paris"))

    agent2 = TravelAgent("travel_agent_2")
    print(agent2.invoke_agent("New York"))

Multi-Agent System Tracing

main.py
from planning_agent import PlanningAgent

if __name__ == "__main__":
    planning_agent = PlanningAgent("planner-1")
    goal = "Build a multi-agent system"
    result = planning_agent.plan(goal)
    print(result)
utils.py
from judgeval.tracer import Tracer

judgment = Tracer(project_name="multi-agent-system")
planning_agent.py
from utils import judgment
from research_agent import ResearchAgent
from task_agent import TaskAgent

class PlanningAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent() # Only add @judgment.agent() to the entry point function of the agent
    @judgment.observe()
    def invoke_agent(self, goal):
        print(f"Agent {self.id} is planning for goal: {goal}")

        research_agent = ResearchAgent("Researcher1")
        task_agent = TaskAgent("Tasker1")

        research_results = research_agent.invoke_agent(goal)
        task_result = task_agent.invoke_agent(research_results)

        return f"Results from planning and executing for goal '{goal}': {task_result}"

    @judgment.observe() # No need to add @judgment.agent() here
    def random_tool(self):
        pass
research_agent.py
from utils import judgment

class ResearchAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent()
    @judgment.observe()
    def invoke_agent(self, topic):
        return f"Research notes for topic: {topic}: Findings and insights include..."
task_agent.py
from utils import judgment

class TaskAgent:
    def __init__(self, id):
        self.id = id

    @judgment.agent()
    @judgment.observe()
    def invoke_agent(self, task):
        result = f"Performed task: {task}, here are the results: Results include..."
        return result

The trace will show up in the Judgment platform clearly indicating which agent called which method:

Each agent's tool calls are clearly associated with their respective classes, making it easy to follow the execution flow across your multi-agent system.


TraceScorerConfig()

Initialize a TraceScorerConfig object for running an evaluation on the trace.

Parameters

scorerrequired:TraceAPIScorerConfig
The scorer to run on the trace
model:str
Model name for evaluation
Default: "gpt-4.1"
sampling_rate:float

A float between 0 and 1 representing the chance the eval should be sampled

Default: 1.0
run_condition:Callable[..., bool]

A function that returns a boolean indicating whether the eval should be run. When TraceScorerConfig is used in @tracer.observe(), run_condition is called with the decorated function's arguments. For example, lambda x: x > 10 will only run when x > 10.

Default: None

Example

A trace eval will be run for the trace/sub-trace with the process_request() function as the root

trace_scorer_config.py
judgment = Tracer(project_name="default_project")

# Retrieve a trace scorer to be used with the TraceScorerConfig
trace_scorer = TracePromptScorer.get(name="sample_trace_scorer")

# A trace eval is only triggered if process_request() is called with x > 10
@judgment.observe(
  span_type="function",
  scorer_config=TraceScorerConfig(
    scorer=trace_scorer,
    sampling_rate=1.0,
    run_condition=lambda x: x > 10
  )
)
def process_request(x):
  return x + 1

tracer.get_current_span()

Returns the current span object for direct access to span properties and methods, useful for debugging and inspection.

Attributes

The current span object provides these properties for inspection and debugging:

span_id:str
Unique identifier for this span
trace_id:str
ID of the parent trace
function:str
Name of the function being traced
span_type:str

Type of span ("span", "tool", "llm", "evaluation", "chain")

inputs:dict
Input parameters for this span
output:Any
Output/result of the span execution
duration:float
Execution time in seconds
depth:int
Nesting depth in the trace hierarchy
parent_span_id:str | None
ID of the parent span (if nested)
agent_name:str | None
Name of the agent executing this span
has_evaluation:bool
Whether this span has evaluation runs
evaluation_runs:List[EvaluationRun]
List of evaluations run on this span
usage:TraceUsage | None
Token usage and cost information
error:Dict[str, Any] | None
Error information if span failed
state_before:dict | None
Agent state before execution
state_after:dict | None
Agent state after execution

Example Usage

@tracer.observe(span_type="tool")
def debug_tool(query):
    span = tracer.get_current_span()
    if span:
        # Access span properties for debugging
        print(f"Executing {span.function} (ID: {span.span_id})")
        print(f"Depth: {span.depth}, Type: {span.span_type}")
        print(f"Inputs: {span.inputs}")

        # Check parent relationship
        if span.parent_span_id:
            print(f"Parent span: {span.parent_span_id}")

        # Monitor execution state
        if span.agent_name:
            print(f"Agent: {span.agent_name}")

    result = perform_search(query)

    # Check span after execution
    if span:
        print(f"Output: {span.output}")
        print(f"Duration: {span.duration}s")
        if span.has_evaluation:
            print(f"Has {len(span.evaluation_runs)} evaluations")
        if span.error:
            print(f"Error: {span.error}")

    return result

Advanced Tracing

tracer.set_customer_id

You can specify a span and its children spans to be associated with a customer_id, which enables further insights through our usage dashboards. Note that in order for the trace to be associated with a customer_id, you need to set this on the root_span of the trace for it to show up on the monitoring tables.

tracer.set_tag

Coming soon.

OpenTelemetry TracerProvider

tracer_otel.py
from judgeval.tracer import Tracer
from opentelemetry.sdk.trace import TracerProvider

tracer_provider = TracerProvider()

# Initialize tracer with OpenTelemetry configuration
judgment = Tracer(
    project_name="default_project",
    resource_attributes={
        "service.name": "my-ai-agent",
        "service.version": "1.2.0",
        "deployment.environment": "production"
    }
)

tracer_provider.add_span_processor(judgment.get_processor())
tracer = tracer_provider.get_tracer(__name__)

def answer_question(question: str) -> str:
    with tracer.start_as_current_span("answer_question_span") as span:
        span.set_attribute("question", question)
        answer = "The capital of the United States is Washington, D.C."
        span.set_attribute("answer", answer)
    return answer

def process_request(question: str) -> str:
    with tracer.start_as_current_span("process_request_span") as span:
        span.set_attribute("input", question)
        answer = answer_question(question)
        span.set_attribute("output", answer)
    return answer

if __name__ == "__main__":
    print(process_request("What is the capital of the United States?"))