Judges
View and manage judges and their online-evaluation settings.
View and manage judges and their online-evaluation settings.
Commands
| Command | Description |
|---|---|
judges create | Create a prompt judge. |
judges delete | Delete judges. |
judges get | Get a judge by ID. |
judges get-settings | Get a judge’s online-evaluation settings. |
judges init | Initialise a skeleton custom judge file. |
judges list | List judges in a project. |
judges models | List judge models. |
judges set-tag | Add or remove a version tag on a judge. |
judges update | Update a judge. |
judges update-settings | Update a judge’s online-evaluation settings. |
judges upload | Upload a custom judge bundle to a project. |
judges create
Create a prompt judge.
Create a new prompt judge in a project. The judge runs the supplied prompt against the configured LLM model to score spans.
judgment judges create [OPTIONS] <PROJECT_ID> <NAME> <MODEL> <PROMPT>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
NAME | yes |
MODEL | yes |
PROMPT | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--judge-description | text | no | Human-readable description shown in the UI. |
--description | text | no | Description stored on the underlying scorer version. |
--score-type | numeric, binary, categorical | yes | — |
--categories | text | no | List of {name, description} choices for categorical judges. Ignored for other score types. |
--min-score | number | no | Lower bound for numeric judges. Defaults to 0. |
--max-score | number | no | Upper bound for numeric judges. Defaults to 1. |
judges delete
Delete judges.
Delete one or more judges by ID. Behaviors that reference these judges are also removed.
judgment judges delete [OPTIONS] <PROJECT_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--judge-ids | text | yes | Judge UUIDs to delete. |
judges get
Get a judge by ID.
Return full detail (including all versions) for a single judge.
judgment judges get <PROJECT_ID> <JUDGE_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
JUDGE_ID | yes |
judges get-settings
Get a judge’s online-evaluation settings.
judgment judges get-settings <PROJECT_ID> <JUDGE_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
JUDGE_ID | yes |
judges init
Initialise a skeleton custom judge file.
judgment judges init [OPTIONS]Options
| Flag | Type | Required | Description |
|---|---|---|---|
-t, --response-type | binary, categorical, numeric | yes | Response type for the judge. |
-n, --name | text | yes | Judge class name (must be a valid Python identifier). |
-p, --init-path | text | no | Directory in which to create the judge file. |
-r, --include-requirements | boolean | no | Also create an empty requirements.txt next to the judge file. |
-y, --yes | boolean | no | Skip the file creation confirmation prompt. |
judges list
List judges in a project.
List every judge in a project, including prompt, code, and custom (uploaded) judges. Returns each judge with its current configuration and online-evaluation settings.
judgment judges list <PROJECT_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
judges models
List judge models.
List the chat models available for use as the LLM backing a prompt judge.
judgment judges modelsjudges set-tag
Add or remove a version tag on a judge.
Add or remove a tag (e.g. prod) on a specific version of a judge. Use action: "add" to set the tag and action: "remove" to clear it.
judgment judges set-tag [OPTIONS] <PROJECT_ID> <JUDGE_ID> <TAG>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
JUDGE_ID | yes |
TAG | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--major-version | number | yes | Judge version to tag. |
--minor-version | number | yes | Judge version to tag. |
--action | add, remove | yes | — |
judges update
Update a judge.
Update a judge — model, prompt, description, score type, categories, score bounds, agent prompts, or version tags. Pass target_major_version/target_minor_version to update a specific version; otherwise the latest version is updated.
judgment judges update [OPTIONS] <PROJECT_ID> <JUDGE_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
JUDGE_ID | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--judge-description | text | no | New UI description (pass null to clear). |
--score-type | numeric, binary, categorical | no | — |
--description | text | no | New scorer-version description (pass null to clear). |
--model | text | no | New LiteLLM model id. Use list_judge_models for available IDs. |
--prompt | text | no | New prompt template. |
--categories | text | no | List of {name, description} choices for categorical judges. Ignored for other score types. |
--min-score | number | no | Updated lower bound for numeric judges. |
--max-score | number | no | Updated upper bound for numeric judges. |
--target-major-version | number | no | Major version to write to. If it does not exist, a new version is created. |
--target-minor-version | number | no | Minor version to write to. If it does not exist, a new version is created. |
--source-major-version | number | no | Major version to copy unspecified fields from. Defaults to the latest version. |
--source-minor-version | number | no | Minor version to copy unspecified fields from. Defaults to the latest version. |
--agent-prompts | text | no | For agent judges only: replacement list of named sub-prompts ({name, prompt}). |
--new-behaviors | text | no | New behaviors to attach to this judge. Each entry: {value, description?, category_ids?}. |
judges update-settings
Update a judge’s online-evaluation settings.
Update how often and on which spans a judge runs online. Pass evaluation_mode: continuous with a sampling rate to evaluate automatically, or on_demand to require manual judgment traces evaluate calls.
judgment judges update-settings [OPTIONS] <PROJECT_ID> <JUDGE_ID>Arguments
| Name | Required |
|---|---|
PROJECT_ID | yes |
JUDGE_ID | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--evaluation-mode | continuous, on_demand | yes | — |
--sampling-rate | number | yes | Percent (0–100) of qualifying spans to score. |
--span-triggers | text | no | JSON array of span filters that restrict which spans the judge evaluates. Pass [] to evaluate all spans. Shape: Use field: "span_name" to match on span names; field: "span_attribute" with key: "<attr>" to match on a span attribute's value. Triggers are ANDed together — a span must match every entry to be evaluated. |
--session-scoring | boolean | no | When true, run the judge at session granularity instead of per-span. |
--span-triggers shape
[
{
"field": "span_name" | "span_attribute",
"operator": "contains" | "equals" | "exists",
"value": "<string>",
"key": "<attribute key>"?
},
...
]judges upload
Upload a custom judge bundle to a project.
The entrypoint must define a class that inherits from ``Judge``,
``TraceCustomScorer``, or ``ExampleCustomScorer`` parameterised with a
response type (``BinaryResponse``, ``NumericResponse``, or a
``CategoricalResponse`` subclass with ``categories``).judgment judges upload [OPTIONS] <ENTRYPOINT_PATH>Arguments
| Name | Required |
|---|---|
ENTRYPOINT_PATH | yes |
Options
| Flag | Type | Required | Description |
|---|---|---|---|
-p, --project-id | text | yes | Project ID to upload the judge to. |
-r, --requirements | path | no | Path to a requirements.txt file to install with the judge. |
-i, --include | path | no | Additional file or directory to include in the bundle (repeatable). |
-n, --name | text | no | Custom judge name. Defaults to the detected class name. |
-m, --bump-major | boolean | no | Bump the major version when re-uploading an existing judge. |
-y, --yes | boolean | no | Skip the upload confirmation prompt. |