Tracer
Capture execution traces and LLM performance metrics for your application.
Capture execution traces and LLM performance metrics for your application.
Tracer is the primary way to add observability to AI agents and LLM
pipelines. It records spans (units of work), automatically captures
inputs/outputs, and exports everything to the Judgment dashboard.
Getting started:
- Call
Tracer.init()to create and activate a tracer. - Decorate your functions with
@Tracer.observe()to trace them. - Optionally wrap LLM clients with
Tracer.wrap()for automatic token/cost tracking.
Basic setup and usage:
from judgeval import Tracer
tracer = Tracer.init(project_name="search-assistant")
@Tracer.observe(span_type="tool")
def search(query: str) -> str:
return vector_db.search(query)
@Tracer.observe(span_type="agent")
async def answer(question: str) -> str:
context = search(question)
return await llm.generate(question, context)Wrap an LLM client for automatic instrumentation:
from openai import OpenAI
openai = Tracer.wrap(OpenAI())Attributes
TRACER_NAME
JUDGEVAL_TRACER_INSTRUMENTING_MODULE_NAME
project_name
project_name
project_id
project_id
api_key
api_key
organization_id
organization_id
api_url
api_url
environment
environment
serializer
serializer
init()
Create and activate a new Tracer.
This is the recommended way to initialize tracing. Credentials are
read from environment variables (JUDGMENT_API_KEY, JUDGMENT_ORG_ID,
JUDGMENT_API_URL) when not passed explicitly. If credentials are
missing, the tracer still works but spans won't be exported.
tracer = Tracer.init(
project_name="search-assistant",
environment="production",
)def init(project_name=None, api_key=None, organization_id=None, api_url=None, environment=None, set_active=True, serializer=safe_serialize, resource_attributes=None, sampler=None, span_limits=None, span_processors=None) -> Tracer:Parameters
project_name
:Optional[str]
Your Judgment project name. Required for span export.
None
api_key
:Optional[str]
Judgment API key. Defaults to JUDGMENT_API_KEY env var.
None
organization_id
:Optional[str]
Organization ID. Defaults to JUDGMENT_ORG_ID env var.
None
api_url
:Optional[str]
API endpoint URL. Defaults to JUDGMENT_API_URL env var.
None
environment
:Optional[str]
Label for this deployment (e.g. "staging",
"production"). Shows up in the Judgment dashboard.
None
set_active
:bool
If True (default), sets this as the global tracer so
@Tracer.observe() and other static methods use it.
True
serializer
:Callable[[Any], str]
Custom serializer for span inputs/outputs.
safe_serialize
resource_attributes
:Optional[Dict[str, Any]]
Extra OpenTelemetry resource attributes.
None
sampler
:Optional[Sampler]
Custom OpenTelemetry sampler.
None
span_limits
:Optional[SpanLimits]
OpenTelemetry span limits.
None
span_processors
:Optional[Sequence[SpanProcessor]]
Additional span processors appended after the default Judgment processor.
None
Returns
Tracer - A configured and active Tracer instance.
set_active()
Set this tracer as the globally active tracer.
def set_active() -> bool:Returns
bool - True if the tracer was successfully activated.
get_span_exporter()
Return the span exporter for this tracer.
Returns a no-op exporter when monitoring is disabled.
def get_span_exporter() -> JudgmentSpanExporter:Returns
JudgmentSpanExporter - The JudgmentSpanExporter (or no-op variant) for this tracer.
get_span_processor()
Return the span processor for this tracer.
Returns a no-op processor when monitoring is disabled.
def get_span_processor() -> JudgmentSpanProcessor:Returns
JudgmentSpanProcessor - The JudgmentSpanProcessor (or no-op variant) for this tracer.
get_current_span()
Return the currently active span from the Judgment tracer provider.
def get_current_span() -> opentelemetry.trace.Span:Returns
opentelemetry.trace.Span - The active Span object.
force_flush()
Send all pending spans to Judgment immediately.
Call this before your process exits (e.g. in a serverless function) to ensure no spans are lost. Does not shut down the tracer.
def lambda_handler(event, context):
result = process(event)
Tracer.force_flush()
return resultdef force_flush(timeout_millis=30000) -> bool:Parameters
timeout_millis
:int
Maximum wait time in milliseconds.
30000
Returns
bool - True if all spans were flushed within the timeout.
shutdown()
Flush pending spans and shut down the tracer.
Call this on application exit to ensure all data is exported before the process terminates.
def shutdown(timeout_millis=30000) -> None:Parameters
timeout_millis
:int
Maximum wait time in milliseconds.
30000
Returns
None
registerOTELInstrumentation()
Register a third-party OpenTelemetry instrumentor with Judgment.
Use this to route spans from libraries like opentelemetry-instrumentation-requests
through the Judgment trace pipeline.
def registerOTELInstrumentation(instrumentor) -> None:Parameters
instrumentorrequired
Returns
None
start_span()
Start a new span that must be ended manually with span.end().
Prefer the span context manager for automatic lifecycle management.
def start_span(name, attributes=None) -> opentelemetry.trace.Span:Parameters
name
required:str
Name for the new span.
attributes
:Optional[Dict[str, Any]]
Optional dictionary of initial span attributes.
None
Returns
opentelemetry.trace.Span - The newly started Span.
start_as_current_span()
Start a span and set it as the current span in the context.
def start_as_current_span(name, attributes=None) -> Iterator[Span]:Parameters
name
required:str
Name for the new span.
attributes
:Optional[Dict[str, Any]]
Optional dictionary of initial span attributes.
None
Returns
Iterator[Span]
span()
Open a child span using a with block.
Use this for tracing a section of code that isn't a standalone function. Exceptions are automatically recorded on the span.
with Tracer.span("process-results"):
results = parse(raw_data)
Tracer.set_attribute("result_count", len(results))def span(span_name) -> Iterator[Span]:Parameters
span_name
required:str
Name for this span (visible in the dashboard).
Returns
Iterator[Span]
observe()
Decorator that automatically traces a function call.
Wraps any sync or async function in a span. Inputs and outputs are captured automatically. Works with or without parentheses.
Basic usage:
@Tracer.observe(span_type="tool")
def search(query: str) -> list[str]:
return vector_db.search(query)Async functions work the same way:
@Tracer.observe(span_type="agent")
async def answer(question: str) -> str:
context = search(question)
return await llm.generate(question, context)Without parentheses (uses default settings):
@Tracer.observe
def my_function():
...def observe(func=None, span_type='span', span_name=None, record_input=True, record_output=True, disable_generator_yield_span=False) -> C | Callable[[C], C]:Parameters
func
:Optional[C]
The function to wrap (set implicitly when used as
@Tracer.observe without parentheses).
None
span_type
:Optional[str]
The kind of span. Use "tool", "agent", "llm",
or "function" to categorize work in the dashboard.
Defaults to "span".
'span'
span_name
:Optional[str]
Override the span name (defaults to the function name).
None
record_input
:bool
Capture and store function arguments. Set to False for functions with sensitive or very large inputs.
True
record_output
:bool
Capture and store the return value.
True
disable_generator_yield_span
:bool
Suppress per-yield child spans for generator functions.
False
Returns
C | Callable[[C], C]
wrap()
Wrap an LLM client for automatic tracing of all API calls.
Supported providers: OpenAI, Anthropic, Together AI, and Google GenAI. Once wrapped, every API call made through the client is recorded as a span with model name, token counts, and cost.
from openai import OpenAI
from anthropic import Anthropic
openai = Tracer.wrap(OpenAI())
anthropic = Tracer.wrap(Anthropic())def wrap(client) -> TClient:Parameters
client
required:TClient
An LLM provider client instance (e.g. OpenAI(),
Anthropic()).
Returns
TClient - The same client instance, now instrumented with tracing.
set_span_kind()
Set the judgment.span_kind attribute on the current span.
def set_span_kind(kind) -> None:Parameters
kind
required:str
Returns
None
set_llm_span()
def set_llm_span() -> None:Returns
None
set_tool_span()
def set_tool_span() -> None:Returns
None
set_general_span()
def set_general_span() -> None:Returns
None
set_attribute()
Attach a custom key-value pair to the current span.
Use this to record application-specific metadata that you want to see in the Judgment dashboard. Non-primitive values (dicts, lists, objects) are serialized to strings automatically.
Tracer.set_attribute("user_tier", "premium")
Tracer.set_attribute("search_results_count", len(results))def set_attribute(key, value) -> None:Parameters
key
required:str
Attribute name (e.g. "user_tier", "search_results_count").
value
required:Any
The value to record.
Returns
None
set_attributes()
Set multiple custom attributes on the current span at once.
def set_attributes(attributes) -> None:Parameters
attributes
required:Dict[str, Any]
Dictionary of key-value pairs to set.
Returns
None
set_input()
Manually set the input for the current span.
Use when @observe(record_input=False) is set but you want to
record a sanitized or transformed version of the input.
def set_input(input_data) -> None:Parameters
input_data
required:Any
The input value to record.
Returns
None
set_output()
Manually set the output for the current span.
Use when @observe(record_output=False) is set but you want to
record a sanitized or transformed version of the output.
def set_output(output_data) -> None:Parameters
output_data
required:Any
The output value to record.
Returns
None
recordLLMMetadata()
Record model, token usage, and cost on the current span.
If you're using Tracer.wrap() this is called automatically. Use
this method when you need to record metadata for a custom LLM
integration.
@Tracer.observe(span_type="llm")
def call_custom_model(prompt: str) -> str:
response = my_model.generate(prompt)
Tracer.recordLLMMetadata({
"model": "my-model-v2",
"output_tokens": response.usage.output,
"total_cost_usd": response.usage.cost,
})
return response.textdef recordLLMMetadata(metadata) -> None:Parameters
metadata
required:LLMMetadata
A dict with keys like model, provider,
non_cached_input_tokens, output_tokens, and
total_cost_usd. All fields are optional.
Returns
None
set_customer_id()
Associate the current trace with a customer.
Once set, this ID propagates to all child spans and enables per-customer analytics in the Judgment dashboard. Call this early in your request handler.
@Tracer.observe(span_type="agent")
def handle_request(user_id: str, question: str):
Tracer.set_customer_id(user_id)
return answer(question)def set_customer_id(customer_id) -> None:Parameters
customer_id
required:str
Your internal customer identifier.
Returns
None
set_customer_user_id()
Set the customer user ID on the current span and propagate to children.
def set_customer_user_id(customer_user_id) -> None:Parameters
customer_user_id
required:str
The customer user ID to associate with this trace.
Returns
None
set_session_id()
Associate the current trace with a conversation session.
Groups multiple requests into a session in the Judgment dashboard. Propagates to all child spans. Call this early in your request handler.
@Tracer.observe(span_type="agent")
def handle_message(session_id: str, message: str):
Tracer.set_session_id(session_id)
return chatbot.respond(message)def set_session_id(session_id) -> None:Parameters
session_id
required:str
Your session or conversation identifier.
Returns
None
tag()
Add tags to the current trace for filtering in the dashboard.
Tags are sent asynchronously and appear in the Judgment monitoring view. Useful for marking traces by feature, experiment, or user segment.
Tracer.tag("rag-pipeline")
Tracer.tag(["experiment-v2", "premium-user"])def tag(tags) -> None:Parameters
tags
required:str | list[str]
A single tag string or a list of tags.
Returns
None
async_evaluate()
Run a hosted evaluation on this span when it completes.
The evaluation is queued and processed server-side by the Judgment platform after the span ends. Use this to score live traffic without blocking your application.
@Tracer.observe(span_type="agent")
def answer(question: str) -> str:
response = llm.generate(question)
Tracer.async_evaluate(
"faithfulness",
{"input": question, "actual_output": response},
)
return responsedef async_evaluate(judge, example=None) -> None:Parameters
judge
required:str
Name of the hosted judge/scorer (e.g. "faithfulness",
"answer_relevancy").
example
:Optional[Dict[str, Any]]
Optional dict with evaluation data. Keys like input,
actual_output, expected_output, and retrieval_context
are commonly used.
None
Returns
None