Judgeval
The main entry point for interacting with the Judgment platform.
The main entry point for interacting with the Judgment platform.
Judgeval connects to your Judgment project and gives you access to
evaluations, datasets, and prompt versioning through
convenient properties.
Credentials are resolved in order: explicit arguments first, then
environment variables JUDGMENT_API_KEY, JUDGMENT_ORG_ID, and
JUDGMENT_API_URL.
ValueError: If any required credential or project_name is missing.
Minimal setup (credentials from environment variables):
from judgeval import Judgeval
client = Judgeval(project_name="search-assistant")Explicit credentials:
client = Judgeval(
project_name="search-assistant",
api_key="jdg_...",
organization_id="org_...",
)Once initialized, use the evaluation, datasets, and prompts
properties:
eval_runner = client.evaluation.create()
dataset = client.datasets.get(name="golden-set")
prompt = client.prompts.get(name="system-prompt", tag="production")Attributes
evaluation
Access evaluations for scoring examples with hosted or custom judges.
Use .create() to get an Evaluation you
can call .run() on.
eval_runner = client.evaluation.create()
results = eval_runner.run(
examples=examples,
scorers=["faithfulness", "answer_relevancy"],
eval_run_name="nightly-eval",
)offline_tests
Access offline tests: test configs and dataset-backed test runs.
Use .create_config() to bind a dataset
to judges, and .run() to execute a test run (optionally
driving an agent entrypoint and asserting a pass
condition).
config = client.offline_tests.create_config(
name="nightly-regression",
dataset="golden-set",
judges=["helpfulness"],
)
result = client.offline_tests.run(
test_config="nightly-regression",
agent_function=my_agent,
pass_condition_fn=lambda fields, scorers: all(
s.error is None for s in scorers
),
assert_test=True,
)datasets
Manage datasets of evaluation examples.
Use .create(), .get(), or .list() to work
with datasets.
dataset = client.datasets.create(
name="golden-set",
schema={
"type": "object",
"properties": {
"input": {"type": "string"},
"expected_output": {"type": "string"},
},
},
examples=[
Example.create(input="What is 2+2?", expected_output="4"),
],
)prompts
Manage versioned prompt templates with tagging support.
Use .create(), .get(), .tag(), or .list()
to work with prompts.
prompt = client.prompts.create(
name="system-prompt",
prompt="You are a helpful assistant for {{product}}.",
tags=["v1"],
)
compiled = prompt.compile(product="Acme Search")agent_judges
Manage Agent Judges (prompt-based scorers) on the platform.
Use .create() or .update() to create
and update prompt-based Agent Judges.
judge = client.agent_judges.create(
name="helpfulness",
prompt="Score the assistant's helpfulness from 0 to 1.",
model="gpt-5.2",
score_type="numeric",
)
client.agent_judges.update(
judge_id=judge.judge_id,
prompt="Updated rubric prompt.",
)offline_tracer()
Create and activate an OfflineTracer for this project.
Reuses the credentials supplied to this Judgeval instance. Each
completed root span appends an Example to dataset, carrying
the offline trace id and the static example_fields.
client = Judgeval(project_name="default_project")
results: list[Example] = []
tracer = client.offline_tracer(
dataset=results,
example_fields={
"input": item.input,
"golden_output": item.golden_output,
},
)def offline_tracer(*, dataset, example_fields=None, environment=None, set_active=True, serializer=safe_serialize, resource_attributes=None, sampler=None, span_limits=None, span_processors=None) -> 'OfflineTracer':Parameters
dataset
required:List['Example']
List that receives an Example for each completed root span.
example_fields
:Optional[Dict[str, Any]]
Fields included on every Example in dataset
(e.g. {"input": ..., "golden_output": ...}).
None
environment
:Optional[str]
Deployment environment label.
None
set_active
:bool
If True, register this as the active tracer.
True
serializer
:Callable[[Any], str]
Custom serializer for span inputs/outputs.
safe_serialize
resource_attributes
:Optional[Dict[str, Any]]
Extra OTel resource attributes.
None
sampler
:Optional['Sampler']
Custom OTel sampler.
None
span_limits
:Optional['SpanLimits']
OTel span limits.
None
span_processors
:Optional[Sequence['SpanProcessor']]
Additional span processors appended after the default offline processor.
None
Returns
'OfflineTracer'