Judgment Labs Logo

OfflineTestRunner

Executes the offline-test lifecycle for a test config (the TypeScript port of the Python `OfflineTestRunner`): resolve the dataset version, optionally run the agent to produce offline traces, create the test run, wait for terminal status, fetch results, evaluate the pass condition, and report successes.

Executes the offline-test lifecycle for a test config (the TypeScript port of the Python OfflineTestRunner): resolve the dataset version, optionally run the agent to produce offline traces, create the test run, wait for terminal status, fetch results, evaluate the pass condition, and report successes.

runAgent()

Run the agent once per dataset example, producing one offline trace each.

NOTE: the offline-tracer lifecycle here (active-tracer swap, async observe, per-example trace attribution) still needs validation against a live run.

async function runAgent(agentFunction: AgentFunction, examples: ExampleRow[]): Promise<Record<string, string>>

Parameters

agentFunction

required

:

AgentFunction

examples

required

:

ExampleRow[]

Returns

Promise<Record<string, string>>