OfflineTestsFactory
Create test configs and execute offline test runs. Access via `client.offlineTests`. A *test config* pairs a dataset with a set of platform judges; a *test run* evaluates one dataset version and stores per-example results.
Create test configs and execute offline test runs. Access via
client.offlineTests. A test config pairs a dataset with a set of platform
judges; a test run evaluates one dataset version and stores per-example
results.
const config = await client.offlineTests.createConfig("nightly", "golden-set", [
"helpfulness",
"faithfulness",
]);
const result = await client.offlineTests.run("nightly");
console.log(result?.uiResultsUrl);createConfig()
Create a test config binding a dataset to a set of judges.
async function createConfig(name: string, dataset: string, judges: JudgeRef[], description?: string | undefined): Promise<TestConfig | null>Parameters
name
required:string
Name for the test config.
dataset
required:string
Dataset name or dataset id.
judges
required:JudgeRef[]
Judges to attach (judge names, or { judgeId } / { name }).
description
:string | undefined
Optional human-readable description.
Returns
Promise<TestConfig | null>
getConfig()
Fetch a test config by id (UUID) or name.
async function getConfig(testConfig: string): Promise<TestConfig | null>Parameters
testConfig
required:string
Returns
Promise<TestConfig | null>
listConfigs()
List test configs in the project, optionally filtered to one dataset.
async function listConfigs(datasetId?: string | undefined): Promise<TestConfig[] | null>Parameters
datasetId
:string | undefined
Returns
Promise<TestConfig[] | null>
deleteConfig()
Delete a test config by id. Returns false if the project is unresolved.
async function deleteConfig(testConfigId: string): Promise<boolean>Parameters
testConfigId
required:string
Returns
Promise<boolean>
run()
Run an offline test for a test config.
Without agentFunction, the judges score each example's existing trace.
With agentFunction, the agent runs once per example first and the judges
score the resulting agent trace.
async function run(testConfig: string | TestConfig, options: OfflineRunOptions = {}): Promise<OfflineTestResult | null>Parameters
testConfig
required:string | TestConfig
Test config name, id, or TestConfig object.
options
:OfflineRunOptions
Run options (agent function, judge versions, dataset version, pass condition, assert, timeout).
{}
Returns
Promise<OfflineTestResult | null>
OfflineTestRunner
Executes the offline-test lifecycle for a test config (the TypeScript port of the Python `OfflineTestRunner`): resolve the dataset version, optionally run the agent to produce offline traces, create the test run, wait for terminal status, fetch results, evaluate the pass condition, and report successes.
TestConfig
Types for the offline-tests SDK surface. Mirrors the Python `judgeval.offline_tests` module (TestConfig, OfflineTestResult, JudgeVersionPin, AgentFunction, PassConditionFn) in TypeScript idiom. A reusable offline-test configuration (a dataset + a set of platform judges). Created via `client.offlineTests.createConfig()`.
Last updated on