Evaluation/Scorers
Introduction
Scorers are used to evaluate agent systems based on specific criteria.
Overview
Want to see a new scorer?
We're always adding new scorers to judgeval
. If you have a suggestion, please let us know by opening a GitHub issue!
Categories of Scorers
judgeval
supports three implementations of scorers.
Default Scorers: plug-and-play scorers carefully crafted by our research team.
Custom Scorers: Powerful scorers that you can tailor to your own agent systems.
Classifier Scorers: A custom scorer that evaluates based on a prompt you provide.
Running Scorers
All scorers in judgeval
can be run uniformly through the JudgmentClient
. All scorers are set to run in async mode by default in order to support parallelized evaluations for large datasets.
from judgeval import JudgmentClient
example = ... # your choice
scorer = ...
client = JudgmentClient()
results = client.run_evaluation(
examples=[example],
scorers=[scorer],
model="gpt-4.1",
)