OfflineTestRunner
Executes the offline-test lifecycle for a test config (the TypeScript port of the Python `OfflineTestRunner`): resolve the dataset version, optionally run the agent to produce offline traces, create the test run, wait for terminal status, fetch results, evaluate the pass condition, and report successes.
Executes the offline-test lifecycle for a test config (the TypeScript port of
the Python OfflineTestRunner): resolve the dataset version, optionally run
the agent to produce offline traces, create the test run, wait for terminal
status, fetch results, evaluate the pass condition, and report successes.
runAgent()
Run the agent once per dataset example, producing one offline trace each.
NOTE: the offline-tracer lifecycle here (active-tracer swap, async
observe, per-example trace attribution) still needs validation against a
live run.
async function runAgent(agentFunction: AgentFunction, examples: ExampleRow[]): Promise<Record<string, string>>Parameters
agentFunction
required:AgentFunction
examples
required:ExampleRow[]
Returns
Promise<Record<string, string>>
OfflineTestResult
The outcome of an offline test run, returned by `client.offlineTests.run()`.
OfflineTestsFactory
Create test configs and execute offline test runs. Access via `client.offlineTests`. A *test config* pairs a dataset with a set of platform judges; a *test run* evaluates one dataset version and stores per-example results.
Last updated on