Judgeval

The main entry point for interacting with the Judgment platform.

Judgeval connects to your Judgment project and gives you access to evaluations, datasets, and prompt versioning through convenient properties.

Credentials are resolved in order: explicit arguments first, then environment variables JUDGMENT_API_KEY, JUDGMENT_ORG_ID, and JUDGMENT_API_URL.

ValueError: If any required credential or project_name is missing.

Minimal setup (credentials from environment variables):

from judgeval import Judgeval

client = Judgeval(project_name="search-assistant")

Explicit credentials:

client = Judgeval(
    project_name="search-assistant",
    api_key="jdg_...",
    organization_id="org_...",
)

Once initialized, use the evaluation, datasets, and prompts properties:

eval_runner = client.evaluation.create()
dataset = client.datasets.get(name="golden-set")
prompt = client.prompts.get(name="system-prompt", tag="production")

Attributes

evaluation

Access evaluations for scoring examples with hosted or custom judges.

Use .create() to get an Evaluation you can call .run() on.

eval_runner = client.evaluation.create()
results = eval_runner.run(
    examples=examples,
    scorers=["faithfulness", "answer_relevancy"],
    eval_run_name="nightly-eval",
)

datasets

Manage datasets of evaluation examples.

Use .create(), .get(), or .list() to work with datasets.

dataset = client.datasets.create(
    name="golden-set",
    examples=[
        Example.create(input="What is 2+2?", expected_output="4"),
    ],
)

prompts

Manage versioned prompt templates with tagging support.

Use .create(), .get(), .tag(), or .list() to work with prompts.

prompt = client.prompts.create(
    name="system-prompt",
    prompt="You are a helpful assistant for {{product}}.",
    tags=["v1"],
)
compiled = prompt.compile(product="Acme Search")

offline_tracer()

Create and activate an OfflineTracer for this project.

Reuses the credentials supplied to this Judgeval instance. Each completed root span appends an Example to dataset, carrying the offline trace id and the static example_fields.

client = Judgeval(project_name="default_project")
results: list[Example] = []
tracer = client.offline_tracer(
    dataset=results,
    example_fields={
        "input": item.input,
        "golden_output": item.golden_output,
    },
)

def offline_tracer(*, dataset, example_fields=None, environment=None, set_active=True, serializer=safe_serialize, resource_attributes=None, sampler=None, span_limits=None, span_processors=None) -> 'OfflineTracer':

Parameters

dataset
required

List['Example']

Caller-owned list. Each completed root span appends a new Example carrying the offline_trace_id of the trace and the static example_fields.

example_fields

Optional[Dict[str, Any]]

Static fields copied onto every emitted example (e.g. {"input": ..., "golden_output": ...}).

Default:

None

environment

Optional[str]

Deployment environment label.

Default:

None

set_active

bool

If True, register this as the active tracer.

Default:

True

serializer

Callable[[Any], str]

Custom serializer for span inputs/outputs.

Default:

safe_serialize

resource_attributes

Optional[Dict[str, Any]]

Extra OTel resource attributes.

Default:

None

sampler

Optional['Sampler']

Custom OTel sampler.

Default:

None

span_limits

Optional['SpanLimits']

OTel span limits.

Default:

None

span_processors

Optional[Sequence['SpanProcessor']]

Additional span processors appended after the default offline processor.

Default:

None

Returns

'OfflineTracer'

Judgeval

Attributes

evaluation

datasets

prompts

offline_tracer()

Parameters

datasetrequired

example_fields

environment

set_active

serializer

resource_attributes

sampler

span_limits

span_processors

Returns

On this page

dataset
required