AgentJudge
A prompt-based Agent Judge stored on the Judgment platform.
ScoreType
Literal['numeric', 'binary', 'categorical']
A prompt-based Agent Judge stored on the Judgment platform.
Agent Judges are LLM-driven scorers. The prompt field is the rubric
prompt used by the agent-judge harness when scoring an output. Versions
are managed implicitly — calling .update() writes a new minor version of
the underlying prompt scorer (matching the default "save" flow in the UI).
client = Judgeval(project_name="my-project")
judge = client.agent_judges.create(
name="helpfulness",
prompt="Score the assistant's helpfulness from 0 to 1.",
model="gpt-5.2",
score_type="numeric",
)
judge = client.agent_judges.update(
judge_id=judge.judge_id,
prompt="Updated rubric prompt.",
)Attributes
judge_id
:str
Unique judge identifier on the Judgment platform.
name
:str
Human-readable name of the judge (unique per project).
prompt
:str
Rubric prompt template used by the agent judge.
model
:str
LiteLLM model id driving the agent judge (e.g. "gpt-5.2").
score_type
:ScoreType
One of "numeric", "binary", or "categorical".
description
:Optional[str]
Optional description stored on the scorer version.
None
judge_description
:Optional[str]
Optional human-readable description shown in the UI.
None
categories
:Optional[List[Dict[str, Any]]]
Choice list for categorical judges
(e.g. [{"name": "good", "description": "..."}, ...]).
field(default=None)
min_score
:Optional[float]
Lower bound for numeric judges (defaults to 0).
None
max_score
:Optional[float]
Upper bound for numeric judges (defaults to 1).
None
major_version
:Optional[int]
Latest major version of the underlying prompt scorer.
None
minor_version
:Optional[int]
Latest minor version of the underlying prompt scorer.
None