AgentJudge

ScoreType

Default:

Literal['numeric', 'binary', 'categorical']

A prompt-based Agent Judge stored on the Judgment platform.

Agent Judges are LLM-driven scorers. The prompt field is the rubric prompt used by the agent-judge harness when scoring an output. Versions are managed implicitly — calling .update() writes a new minor version of the underlying prompt scorer (matching the default "save" flow in the UI).

client = Judgeval(project_name="my-project")
judge = client.agent_judges.create(
    name="helpfulness",
    prompt="Score the assistant's helpfulness from 0 to 1.",
    model="gpt-5.2",
    score_type="numeric",
)

judge = client.agent_judges.update(
    judge_id=judge.judge_id,
    prompt="Updated rubric prompt.",
)

Attributes

judge_id

str

Unique judge identifier on the Judgment platform.

name

str

Human-readable name of the judge (unique per project).

prompt

str

Rubric prompt template used by the agent judge.

model

str

LiteLLM model id driving the agent judge (e.g. "gpt-5.2").

score_type

ScoreType

One of "numeric", "binary", or "categorical".

description

Optional[str]

Optional description stored on the scorer version.

Default:

None

judge_description

Optional[str]

Optional human-readable description shown in the UI.

Default:

None

min_score

Optional[float]

Lower bound for numeric judges (defaults to 0).

Default:

None

max_score

Optional[float]

Upper bound for numeric judges (defaults to 1).

Default:

None

major_version

Optional[int]

Latest major version of the underlying prompt scorer.

Default:

None

minor_version

Optional[int]

Latest minor version of the underlying prompt scorer.

Default:

None

AgentJudge

ScoreType

Attributes

judge_id

name

prompt

model

score_type

description

judge_description

categories

min_score

max_score

major_version

minor_version

On this page