Evaluation
Score a batch of examples using hosted scorers or custom judges. Two modes are supported: - **Hosted scorers** — pass scorer names as strings (e.g. `"faithfulness"`, `"answer_relevancy"`). Evaluation runs server-side on the Judgment platform. - **Custom judges** — pass Judge subclass instances for in-process evaluation with your own scoring logic. Create an `Evaluation` via `client.evaluation.create()`, then call `.run()` to execute scorers against your examples.
Score a batch of examples using hosted scorers or custom judges.
Two modes are supported:
- Hosted scorers — pass scorer names as strings (e.g.
"faithfulness","answer_relevancy"). Evaluation runs server-side on the Judgment platform. - Custom judges — pass Judge subclass instances for in-process evaluation with your own scoring logic.
Create an Evaluation via client.evaluation.create(), then call
.run() to execute scorers against your examples.
const evaluation = client.evaluation.create();
const results = await evaluation.run({
examples,
scorers: ["faithfulness", "answer_relevancy"],
evalRunName: "nightly-eval",
});run()
Run scorers against your examples and return results.
Pass either hosted scorer names (strings) or custom Judge instances. Mixing both in one call is not supported.
async function run(options: EvaluationRunOptions): Promise<ScoringResult[]>Parameters
options
required:EvaluationRunOptions
Evaluation configuration including examples, scorers, and run name.
Returns
Promise<ScoringResult[]> - A list of ScoringResult objects, one per example.
Last updated on