Judges

View and manage judges and their online-evaluation settings.

Command	Description
`judges create`	Create a prompt judge.
`judges delete`	Delete judges.
`judges get`	Get a judge by ID.
`judges get-settings`	Get a judge’s online-evaluation settings.
`judges init`	Initialise a skeleton custom judge file.
`judges list`	List judges in a project.
`judges models`	List judge models.
`judges set-tag`	Add or remove a version tag on a judge.
`judges update`	Update a judge.
`judges update-settings`	Update a judge’s online-evaluation settings.
`judges upload`	Upload a custom judge bundle to a project.

Flag	Type	Required	Description
`--judge-description`	text	no	Human-readable description shown in the UI.
`--description`	text	no	Description stored on the underlying scorer version.
`--score-type`	`numeric`, `binary`, `categorical`	yes	—
`--categories`	text	no	List of `{name, description}` choices for `categorical` judges. Ignored for other score types.
`--min-score`	number	no	Lower bound for `numeric` judges. Defaults to 0.
`--max-score`	number	no	Upper bound for `numeric` judges. Defaults to 1.

Flag	Type	Required	Description
`--judge-ids`	text	yes	Judge UUIDs to delete.

Flag	Type	Required	Description
`-t`, `--response-type`	`binary`, `categorical`, `numeric`	yes	Response type for the judge.
`-n`, `--name`	text	yes	Judge class name (must be a valid Python identifier).
`-p`, `--init-path`	text	no	Directory in which to create the judge file.
`-r`, `--include-requirements`	boolean	no	Also create an empty requirements.txt next to the judge file.
`-y`, `--yes`	boolean	no	Skip the file creation confirmation prompt.

Flag	Type	Required	Description
`--major-version`	number	yes	Judge version to tag.
`--minor-version`	number	yes	Judge version to tag.
`--action`	`add`, `remove`	yes	—

Update a judge — model, prompt, description, score type, categories, score bounds, agent prompts, or version tags. Pass target_major_version/target_minor_version to update a specific version; otherwise the latest version is updated.

judgment judges update [OPTIONS] <PROJECT_ID> <JUDGE_ID>

Arguments

Name	Required
`PROJECT_ID`	yes
`JUDGE_ID`	yes

Options

Flag	Type	Required	Description
`--judge-description`	text	no	New UI description (pass null to clear).
`--score-type`	`numeric`, `binary`, `categorical`	no	—
`--description`	text	no	New scorer-version description (pass null to clear).
`--model`	text	no	New LiteLLM model id. Use `list_judge_models` for available IDs.
`--prompt`	text	no	New prompt template.
`--categories`	text	no	List of `{name, description}` choices for `categorical` judges. Ignored for other score types.
`--min-score`	number	no	Updated lower bound for `numeric` judges.
`--max-score`	number	no	Updated upper bound for `numeric` judges.
`--target-major-version`	number	no	Major version to write to. If it does not exist, a new version is created.
`--target-minor-version`	number	no	Minor version to write to. If it does not exist, a new version is created.
`--source-major-version`	number	no	Major version to copy unspecified fields from. Defaults to the latest version.
`--source-minor-version`	number	no	Minor version to copy unspecified fields from. Defaults to the latest version.
`--agent-prompts`	text	no	For agent judges only: replacement list of named sub-prompts (`{name, prompt}`).
`--new-behaviors`	text	no	New behaviors to attach to this judge. Each entry: `{value, description?, category_ids?}`.

`judges update-settings`

Update a judge’s online-evaluation settings.

Update how often and on which spans a judge runs online. Pass evaluation_mode: continuous with a sampling rate to evaluate automatically, or on_demand to require manual judgment traces evaluate calls.

judgment judges update-settings [OPTIONS] <PROJECT_ID> <JUDGE_ID>

Arguments

Name	Required
`PROJECT_ID`	yes
`JUDGE_ID`	yes

Options

Flag	Type	Required	Description
`--evaluation-mode`	`continuous`, `on_demand`	yes	—
`--sampling-rate`	number	yes	Percent (0–100) of qualifying spans to score.
`--span-triggers`	text	no	JSON array of span filters that restrict which spans the judge evaluates. Pass `[]` to evaluate all spans. Shape: Use `field: "span_name"` to match on span names; `field: "span_attribute"` with `key: "<attr>"` to match on a span attribute's value. Triggers are ANDed together — a span must match every entry to be evaluated.
`--session-scoring`	boolean	no	When true, run the judge at session granularity instead of per-span.

--span-triggers shape

[
  {
    "field": "span_name" | "span_attribute",
    "operator": "contains" | "equals" | "exists",
    "value": "<string>",
    "key": "<attribute key>"?
  },
  ...
]

`judges upload`

Upload a custom judge bundle to a project.

  The entrypoint must define a class that inherits from ``Judge``,
  ``TraceCustomScorer``, or ``ExampleCustomScorer`` parameterised with a
  response type (``BinaryResponse``, ``NumericResponse``, or a
  ``CategoricalResponse`` subclass with ``categories``).

judgment judges upload [OPTIONS] <ENTRYPOINT_PATH>

Arguments

Name	Required
`ENTRYPOINT_PATH`	yes

Options

Flag	Type	Required	Description
`-p`, `--project-id`	text	yes	Project ID to upload the judge to.
`-r`, `--requirements`	path	no	Path to a requirements.txt file to install with the judge.
`-i`, `--include`	path	no	Additional file or directory to include in the bundle (repeatable).
`-n`, `--name`	text	no	Custom judge name. Defaults to the detected class name.
`-m`, `--bump-major`	boolean	no	Bump the major version when re-uploading an existing judge.
`-y`, `--yes`	boolean	no	Skip the upload confirmation prompt.

Judges

Commands

`judges create`

`judges delete`

`judges get`

`judges get-settings`

`judges init`

`judges list`

`judges models`

`judges set-tag`

`judges update`

`judges update-settings`

`judges upload`

On this page