Behaviors

View and manage behaviors (the binary or categorical labels judges assign).

View and manage behaviors (the binary or categorical labels judges assign).

Commands

CommandDescription
behaviors create-binaryCreate a binary (yes/no) behavior.
behaviors create-classifierCreate a classifier (multi-label) behavior.
behaviors deleteDelete a behavior.
behaviors getGet a behavior with judge details and stats.
behaviors listList behaviors.
behaviors updateUpdate a behavior’s description.

behaviors create-binary

Create a binary (yes/no) behavior.

Create a binary behavior. The judge LLM uses your prompt to decide true/false on each qualifying span.

judgment behaviors create-binary [OPTIONS] <PROJECT_ID> <NAME> <PROMPT>

Arguments

NameRequired
PROJECT_IDyes
NAMEyes
PROMPTyes

Options

FlagTypeRequiredDescription
--descriptiontextnoHuman-readable description shown in the UI.
--modeltextnoLLM model ID used by the judge prompt. Defaults to "gpt-5.2" when omitted.
--category-idstextnoUUIDs of categories to attach the behavior to. Pass an array of category UUIDs.
--advanced-settingstextnoJSON object overriding the judge's online-evaluation configuration. All four fields are required when this is supplied. Shape: continuous runs the judge automatically on qualifying spans; on_demand requires a manual judgment traces evaluate call. online_sampling_rate is a percent (0–100) of matching spans to score.
--judge-idtextnoAttach the new behavior to an existing judge instead of creating one. The judge must be score_type=binary and have no existing behaviors.

--advanced-settings shape

{
  "online_evaluation_mode": "continuous" | "on_demand",
  "online_sampling_rate": <number 0-100>,
  "online_span_triggers": [
    {"field":"span_name"|"span_attribute","operator":"contains"|"equals"|"exists","value":"<string>","key":"<attr-key>"?}
  ],
  "online_session_scoring": <bool>
}

behaviors create-classifier

Create a classifier (multi-label) behavior.

Create a classifier behavior. The judge LLM picks one of the supplied options for each qualifying span.

judgment behaviors create-classifier [OPTIONS] <PROJECT_ID> <NAME> <PROMPT>

Arguments

NameRequired
PROJECT_IDyes
NAMEyes
PROMPTyes

Options

FlagTypeRequiredDescription
--optionstextyesJSON array of the allowed output categories the classifier judge can return. Must contain at least one option. Shape:
--modeltextnoLLM model ID used by the judge prompt. Defaults to "gpt-5.2" when omitted.
--category-idstextnoUUIDs of categories to attach the behavior to. Pass an array of category UUIDs.
--advanced-settingstextnoJSON object overriding the judge's online-evaluation configuration. All four fields are required when this is supplied. Shape: continuous runs the judge automatically on qualifying spans; on_demand requires a manual judgment traces evaluate call. online_sampling_rate is a percent (0–100) of matching spans to score.
--judge-idtextnoAttach the new behavior to an existing judge instead of creating one. The judge must be score_type=categorical and have no existing behaviors.

--options shape

[
  {"name":"<label>", "description":"<optional human description>", "category_ids":["<uuid>", ...]},
  ...
]

--advanced-settings shape

{
  "online_evaluation_mode": "continuous" | "on_demand",
  "online_sampling_rate": <number 0-100>,
  "online_span_triggers": [
    {"field":"span_name"|"span_attribute","operator":"contains"|"equals"|"exists","value":"<string>","key":"<attr-key>"?}
  ],
  "online_session_scoring": <bool>
}

behaviors delete

Delete a behavior.

judgment behaviors delete [OPTIONS] <PROJECT_ID> <BEHAVIOR_ID>

Arguments

NameRequired
PROJECT_IDyes
BEHAVIOR_IDyes

Options

FlagTypeRequiredDescription
--delete-scorerbooleannoWhen true, also delete the underlying prompt scorer if no other behaviors reference it.
--delete-all-valuesbooleannoFor classifier behaviors, when true deletes every category row for this judge (not just the provided behavior_id). Ignored for binary behaviors.

behaviors get

Get a behavior with judge details and stats.

judgment behaviors get [OPTIONS] <PROJECT_ID> <BEHAVIOR_ID>

Arguments

NameRequired
PROJECT_IDyes
BEHAVIOR_IDyes

Options

FlagTypeRequiredDescription
--start-datetextnoOptional ISO 8601 start date for stats.
--end-datetextnoOptional ISO 8601 end date for stats.

behaviors list

List behaviors.

List every behavior in a project along with rolled-up trace counts and last-seen stats.

judgment behaviors list <PROJECT_ID>

Arguments

NameRequired
PROJECT_IDyes

behaviors update

Update a behavior’s description.

judgment behaviors update [OPTIONS] <PROJECT_ID> <BEHAVIOR_ID>

Arguments

NameRequired
PROJECT_IDyes
BEHAVIOR_IDyes

Options

FlagTypeRequiredDescription
--descriptiontextnoNew human-readable description for the behavior.