Introduction

Quickstarts

Tracing

To get started with tracing, you can decorate your functions with the @judgment.observe() decorator.

For LLM API clients, you can capture generation telemetry using the wrap() function.

Any function decorated with @judgment.observe() will be traced as its own span.

tracing.py

from judgeval.tracer import Tracer, wrap
from openai import OpenAI

client = wrap(OpenAI())  # tracks all LLM calls
judgment = Tracer(project_name="my_project")

@judgment.observe(span_type="tool")
def format_question(question: str) -> str:
    # dummy tool
    return f"Question : {question}"

@judgment.observe(span_type="function")
def run_agent(prompt: str) -> str:
    task = format_question(prompt)
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": task}]
    )
    return response.choices[0].message.content

run_agent("What is the capital of the United States?")

You should see your trace on the Judgment platform in seconds. Check your terminal for a link to view the trace!

Integrate With Online evals

You can run evals on your traces with any of judgeval's built-in scorers in real-time, enabling you to flag and alert on regressions in production.

To run an online eval, it takes one line of code with the async_evaluate() function. In this example, we'll use the AnswerRelevancyScorer to evaluate the relevance of the agent's response to the user's query.

tracing.py

from judgeval.common.tracer import Tracer, wrap
from judgeval.scorers import AnswerRelevancyScorer
from openai import OpenAI

client = wrap(OpenAI())
judgment = Tracer(project_name="my_project")

@judgment.observe(span_type="tool")
def format_question(question: str) -> str:
    # dummy tool
    return f"Question : {question}"

@judgment.observe(span_type="function")
def run_agent(prompt: str) -> str:
    task = format_question(prompt)
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": task}]
    )
    answer = response.choices[0].message.content
    judgment.async_evaluate(
        scorers=[AnswerRelevancyScorer(threshold=0.5)],
        input=task,
        actual_output=answer,
        model="gpt-4.1"
    )
    print("Online evaluation submitted.")
    return answer

run_agent("What is the capital of the United States?")

You should see the online eval results on the Judgment platform shortly after the trace is recorded. Evals can take time to execute, so they may appear slightly delayed on the UI. Once the eval is complete, you should see it attached to your trace like this:

Trace Your Agents

When working with multi-agent systems, it can be useful to see very easily which agents were calling methods throughout a trace.

In order to aid with this, you can also decorate a class with the @judgment.identify() decorator.

Within the decorator, you can specify what class attribute will be used as the identifier for each agent with the identifier parameter.

trace_agents.py

from judgeval.common.tracer import Tracer

judgment = Tracer(project_name="multi_agent_system")

# The judgment.identify() specifies that the agent's will be 
# identified based on their "name" attribute.
@judgment.identify(identifier="name")
class SimpleAgent:
    def __init__(self, name: str):
        self.name = name
        
    @judgment.observe(span_type="tool")
    def send_message(self, content: str) -> None:
        return f"Message sent with content: {content}"

@judgment.observe(span_type="function")
def main():
    alice = SimpleAgent("Alice")  # agent will be identified as "Alice"
    bob = SimpleAgent("Bob")
    alice.send_message("Hello Bob, how are you?")
    bob.send_message("I'm good Alice, thanks for asking!")

main()

The trace should show up in the Judgment platform clearly indicating which agent called which method (using square brackets):

After running this trace, you can export the complete agent environment data from the Judgment platform:

Navigate to your trace in the platform
Click the "Fetch Tool Calls" button in the trace view
- All tool calls by agent: Which specific agent made each tool call with full attribution
- Input/environment reactions: How each agent responded to environmental inputs and state changes
- Trajectories of entire trace: Complete execution path showing agent decision flows and interaction patterns
- Added metadata: All custom metadata set via judgment.set_metadata() calls

The exported data will include comprehensive information about each agent's statea and behavior, making it easy to analyze multi-agent interactions, debug complex scenarios, and optimize agents for your environment.

Toggling Monitoring

If your setup requires you to toggle monitoring in production-level environments, you can disable monitoring by:

Setting the JUDGMENT_MONITORING environment variable to false (Disables tracing)

export JUDGMENT_MONITORING=false

Setting the JUDGMENT_EVALUATIONS environment variable to false (Disables async_evaluates)

export JUDGMENT_EVALUATIONS=false