Judgment Labs Logo

OfflineTestsFactory

Create test configs and execute offline test runs. Access via `client.offlineTests`. A *test config* pairs a dataset with a set of platform judges; a *test run* evaluates one dataset version and stores per-example results.

Create test configs and execute offline test runs. Access via client.offlineTests. A test config pairs a dataset with a set of platform judges; a test run evaluates one dataset version and stores per-example results.

const config = await client.offlineTests.createConfig("nightly", "golden-set", [
  "helpfulness",
  "faithfulness",
]);
const result = await client.offlineTests.run("nightly");
console.log(result?.uiResultsUrl);

createConfig()

Create a test config binding a dataset to a set of judges.

async function createConfig(name: string, dataset: string, judges: JudgeRef[], description?: string | undefined): Promise<TestConfig | null>

Parameters

name

required

:

string

Name for the test config.

dataset

required

:

string

Dataset name or dataset id.

judges

required

:

JudgeRef[]

Judges to attach (judge names, or { judgeId } / { name }).

description

:

string | undefined

Optional human-readable description.

Returns

Promise<TestConfig | null>


getConfig()

Fetch a test config by id (UUID) or name.

async function getConfig(testConfig: string): Promise<TestConfig | null>

Parameters

testConfig

required

:

string

Returns

Promise<TestConfig | null>


listConfigs()

List test configs in the project, optionally filtered to one dataset.

async function listConfigs(datasetId?: string | undefined): Promise<TestConfig[] | null>

Parameters

datasetId

:

string | undefined

Returns

Promise<TestConfig[] | null>


deleteConfig()

Delete a test config by id. Returns false if the project is unresolved.

async function deleteConfig(testConfigId: string): Promise<boolean>

Parameters

testConfigId

required

:

string

Returns

Promise<boolean>


run()

Run an offline test for a test config.

Without agentFunction, the judges score each example's existing trace. With agentFunction, the agent runs once per example first and the judges score the resulting agent trace.

async function run(testConfig: string | TestConfig, options: OfflineRunOptions = {}): Promise<OfflineTestResult | null>

Parameters

testConfig

required

:

string | TestConfig

Test config name, id, or TestConfig object.

options

:

OfflineRunOptions

Run options (agent function, judge versions, dataset version, pass condition, assert, timeout).

Default:

{}

Returns

Promise<OfflineTestResult | null>