Judge
Base class for building custom evaluation scorers. Subclass `Judge` and implement the `score` method to create your own evaluation logic.
Base class for building custom evaluation scorers.
Subclass Judge and implement the score method to create your own
evaluation logic.
class ContainsAnswer extends Judge<BinaryResponse> {
async score(data: Example): Promise<BinaryResponse> {
const expected = (data.get("expected_output") as string).toLowerCase();
const actual = (data.get("actual_output") as string).toLowerCase();
return {
value: actual.includes(expected),
reason: actual.includes(expected) ? "Found" : "Not found",
};
}
}OfflineJudgmentSpanProcessor
Span processor used by `OfflineTracer`. Extends `JudgmentSpanProcessor` (so it inherits batched export, span state, and partial-emit support) and additionally appends a new `Example` to the caller-supplied `dataset` list whenever a *root* span ends. Each emitted example carries the `offline_trace_id` of the trace plus any static `exampleFields` configured at init time.
AgentJudge
Next Page
Last updated on