Offline Test Runner
AgentFunction
Callable[..., Any]
PassConditionFn
Callable[[Dict[str, Any], List[ScorerData]], bool]
TERMINAL_STATUSES
frozenset(cancelled)
EXAMPLES_PAGE_SIZE
100
ITEMS_PAGE_SIZE
200
normalize_judge_versions()
Validate and normalize judge_versions entries.
Each entry must identify a judge by name (or judge_id) and may pin
a tag, version, or major_version/minor_version pair. Judges
not listed default to their prod tag (else latest) server-side.
ValueError: If an entry is not a dict or identifies no judge.
def normalize_judge_versions(judge_versions) -> Optional[List[Dict[str, Any]]]:Parameters
judge_versions
required:Optional[List[JudgeVersionPin]]
Returns
Optional[List[Dict[str, Any]]]
build_agent_kwargs()
Map an example's data fields onto the agent entrypoint's parameters.
Each declared parameter is filled from the example field of the same name,
or from a custom-mapped field via field_mapping (which maps each
parameter name to the dataset field it reads). Example fields the entrypoint
does not declare are
ignored -- so a dataset can carry extra columns (e.g. trace) the agent
doesn't use -- unless the entrypoint accepts **kwargs, in which case the
leftover fields are passed through too. The match succeeds as long as the
example supplies every required (no-default) parameter.
TypeError: only if a required parameter has no matching example field.
def build_agent_kwargs(agent_function, data, field_mapping=None) -> Dict[str, Any]:Parameters
agent_function
required:AgentFunction
data
required:Dict[str, Any]
field_mapping
:Optional[Dict[str, str]]
None
Returns
Dict[str, Any]
_parse_reason()
Coerce a stored scorer reason into the {text, citations?} wire shape.
def _parse_reason(raw) -> Dict[str, Any]:Parameters
raw
required:Any
Returns
Dict[str, Any]
_reason_text()
def _reason_text(raw) -> Optional[str]:Parameters
raw
required:Any
Returns
Optional[str]