Judges

View and manage judges and their online-evaluation settings.

View and manage judges and their online-evaluation settings.

Commands

CommandDescription
judges createCreate a prompt judge.
judges deleteDelete judges.
judges getGet a judge by ID.
judges get-settingsGet a judge’s online-evaluation settings.
judges initInitialise a skeleton custom judge file.
judges listList judges in a project.
judges modelsList judge models.
judges set-tagAdd or remove a version tag on a judge.
judges updateUpdate a judge.
judges update-settingsUpdate a judge’s online-evaluation settings.
judges uploadUpload a custom judge bundle to a project.

judges create

Create a prompt judge.

Create a new prompt judge in a project. The judge runs the supplied prompt against the configured LLM model to score spans.

judgment judges create [OPTIONS] <PROJECT_ID> <NAME> <MODEL> <PROMPT>

Arguments

NameRequired
PROJECT_IDyes
NAMEyes
MODELyes
PROMPTyes

Options

FlagTypeRequiredDescription
--judge-descriptiontextnoHuman-readable description shown in the UI.
--descriptiontextnoDescription stored on the underlying scorer version.
--score-typenumeric, binary, categoricalyes
--categoriestextnoList of {name, description} choices for categorical judges. Ignored for other score types.
--min-scorenumbernoLower bound for numeric judges. Defaults to 0.
--max-scorenumbernoUpper bound for numeric judges. Defaults to 1.

judges delete

Delete judges.

Delete one or more judges by ID. Behaviors that reference these judges are also removed.

judgment judges delete [OPTIONS] <PROJECT_ID>

Arguments

NameRequired
PROJECT_IDyes

Options

FlagTypeRequiredDescription
--judge-idstextyesJudge UUIDs to delete.

judges get

Get a judge by ID.

Return full detail (including all versions) for a single judge.

judgment judges get <PROJECT_ID> <JUDGE_ID>

Arguments

NameRequired
PROJECT_IDyes
JUDGE_IDyes

judges get-settings

Get a judge’s online-evaluation settings.

judgment judges get-settings <PROJECT_ID> <JUDGE_ID>

Arguments

NameRequired
PROJECT_IDyes
JUDGE_IDyes

judges init

Initialise a skeleton custom judge file.

judgment judges init [OPTIONS]

Options

FlagTypeRequiredDescription
-t, --response-typebinary, categorical, numericyesResponse type for the judge.
-n, --nametextyesJudge class name (must be a valid Python identifier).
-p, --init-pathtextnoDirectory in which to create the judge file.
-r, --include-requirementsbooleannoAlso create an empty requirements.txt next to the judge file.
-y, --yesbooleannoSkip the file creation confirmation prompt.

judges list

List judges in a project.

List every judge in a project, including prompt, code, and custom (uploaded) judges. Returns each judge with its current configuration and online-evaluation settings.

judgment judges list <PROJECT_ID>

Arguments

NameRequired
PROJECT_IDyes

judges models

List judge models.

List the chat models available for use as the LLM backing a prompt judge.

judgment judges models

judges set-tag

Add or remove a version tag on a judge.

Add or remove a tag (e.g. prod) on a specific version of a judge. Use action: "add" to set the tag and action: "remove" to clear it.

judgment judges set-tag [OPTIONS] <PROJECT_ID> <JUDGE_ID> <TAG>

Arguments

NameRequired
PROJECT_IDyes
JUDGE_IDyes
TAGyes

Options

FlagTypeRequiredDescription
--major-versionnumberyesJudge version to tag.
--minor-versionnumberyesJudge version to tag.
--actionadd, removeyes

judges update

Update a judge.

Update a judge — model, prompt, description, score type, categories, score bounds, agent prompts, or version tags. Pass target_major_version/target_minor_version to update a specific version; otherwise the latest version is updated.

judgment judges update [OPTIONS] <PROJECT_ID> <JUDGE_ID>

Arguments

NameRequired
PROJECT_IDyes
JUDGE_IDyes

Options

FlagTypeRequiredDescription
--judge-descriptiontextnoNew UI description (pass null to clear).
--score-typenumeric, binary, categoricalno
--descriptiontextnoNew scorer-version description (pass null to clear).
--modeltextnoNew LiteLLM model id. Use list_judge_models for available IDs.
--prompttextnoNew prompt template.
--categoriestextnoList of {name, description} choices for categorical judges. Ignored for other score types.
--min-scorenumbernoUpdated lower bound for numeric judges.
--max-scorenumbernoUpdated upper bound for numeric judges.
--target-major-versionnumbernoMajor version to write to. If it does not exist, a new version is created.
--target-minor-versionnumbernoMinor version to write to. If it does not exist, a new version is created.
--source-major-versionnumbernoMajor version to copy unspecified fields from. Defaults to the latest version.
--source-minor-versionnumbernoMinor version to copy unspecified fields from. Defaults to the latest version.
--agent-promptstextnoFor agent judges only: replacement list of named sub-prompts ({name, prompt}).
--new-behaviorstextnoNew behaviors to attach to this judge. Each entry: {value, description?, category_ids?}.

judges update-settings

Update a judge’s online-evaluation settings.

Update how often and on which spans a judge runs online. Pass evaluation_mode: continuous with a sampling rate to evaluate automatically, or on_demand to require manual judgment traces evaluate calls.

judgment judges update-settings [OPTIONS] <PROJECT_ID> <JUDGE_ID>

Arguments

NameRequired
PROJECT_IDyes
JUDGE_IDyes

Options

FlagTypeRequiredDescription
--evaluation-modecontinuous, on_demandyes
--sampling-ratenumberyesPercent (0–100) of qualifying spans to score.
--span-triggerstextnoJSON array of span filters that restrict which spans the judge evaluates. Pass [] to evaluate all spans. Shape: Use field: "span_name" to match on span names; field: "span_attribute" with key: "&lt;attr&gt;" to match on a span attribute's value. Triggers are ANDed together — a span must match every entry to be evaluated.
--session-scoringbooleannoWhen true, run the judge at session granularity instead of per-span.

--span-triggers shape

[
  {
    "field": "span_name" | "span_attribute",
    "operator": "contains" | "equals" | "exists",
    "value": "<string>",
    "key": "<attribute key>"?
  },
  ...
]

judges upload

Upload a custom judge bundle to a project.

  The entrypoint must define a class that inherits from ``Judge``,
  ``TraceCustomScorer``, or ``ExampleCustomScorer`` parameterised with a
  response type (``BinaryResponse``, ``NumericResponse``, or a
  ``CategoricalResponse`` subclass with ``categories``).
judgment judges upload [OPTIONS] <ENTRYPOINT_PATH>

Arguments

NameRequired
ENTRYPOINT_PATHyes

Options

FlagTypeRequiredDescription
-p, --project-idtextyesProject ID to upload the judge to.
-r, --requirementspathnoPath to a requirements.txt file to install with the judge.
-i, --includepathnoAdditional file or directory to include in the bundle (repeatable).
-n, --nametextnoCustom judge name. Defaults to the detected class name.
-m, --bump-majorbooleannoBump the major version when re-uploading an existing judge.
-y, --yesbooleannoSkip the upload confirmation prompt.