Evaluation

Score a batch of examples using hosted scorers or custom judges. Two modes are supported: - **Hosted scorers** — pass scorer names as strings (e.g. `"faithfulness"`, `"answer_relevancy"`). Evaluation runs server-side on the Judgment platform. - **Custom judges** — pass Judge subclass instances for in-process evaluation with your own scoring logic. Create an `Evaluation` via `client.evaluation.create()`, then call `.run()` to execute scorers against your examples.

Score a batch of examples using hosted scorers or custom judges.

Two modes are supported:

Hosted scorers — pass scorer names as strings (e.g. "faithfulness", "answer_relevancy"). Evaluation runs server-side on the Judgment platform.
Custom judges — pass Judge subclass instances for in-process evaluation with your own scoring logic.

Create an Evaluation via client.evaluation.create(), then call .run() to execute scorers against your examples.

const evaluation = client.evaluation.create();
const results = await evaluation.run({
  examples,
  scorers: ["faithfulness", "answer_relevancy"],
  evalRunName: "nightly-eval",
});

run()

Run scorers against your examples and return results.

Pass either hosted scorer names (strings) or custom Judge instances. Mixing both in one call is not supported.

async function run(options: EvaluationRunOptions): Promise<ScoringResult[]>

Parameters

options
required

EvaluationRunOptions

Evaluation configuration including examples, scorers, and run name.

Returns

Promise<ScoringResult[]> - A list of ScoringResult objects, one per example.

Evaluation

run()

Parameters

optionsrequired

Returns

On this page

options
required