Judgment Labs Logo
PythonAgent Judges

AgentJudge

A prompt-based Agent Judge stored on the Judgment platform.

ScoreType

Default:

Literal['numeric', 'binary', 'categorical']

A prompt-based Agent Judge stored on the Judgment platform.

Agent Judges are LLM-driven scorers. The prompt field is the rubric prompt used by the agent-judge harness when scoring an output. Versions are managed implicitly — calling .update() writes a new minor version of the underlying prompt scorer (matching the default "save" flow in the UI).

client = Judgeval(project_name="my-project")
judge = client.agent_judges.create(
    name="helpfulness",
    prompt="Score the assistant's helpfulness from 0 to 1.",
    model="gpt-5.2",
    score_type="numeric",
)

judge = client.agent_judges.update(
    judge_id=judge.judge_id,
    prompt="Updated rubric prompt.",
)

Attributes

judge_id

:

str

Unique judge identifier on the Judgment platform.

name

:

str

Human-readable name of the judge (unique per project).

prompt

:

str

Rubric prompt template used by the agent judge.

model

:

str

LiteLLM model id driving the agent judge (e.g. "gpt-5.2").

score_type

:

ScoreType

One of "numeric", "binary", or "categorical".

description

:

Optional[str]

Optional description stored on the scorer version.

Default:

None

judge_description

:

Optional[str]

Optional human-readable description shown in the UI.

Default:

None

categories

:

Optional[List[Dict[str, Any]]]

Choice list for categorical judges (e.g. [{"name": "good", "description": "..."}, ...]).

Default:

field(default=None)

min_score

:

Optional[float]

Lower bound for numeric judges (defaults to 0).

Default:

None

max_score

:

Optional[float]

Upper bound for numeric judges (defaults to 1).

Default:

None

major_version

:

Optional[int]

Latest major version of the underlying prompt scorer.

Default:

None

minor_version

:

Optional[int]

Latest minor version of the underlying prompt scorer.

Default:

None

On this page