Introduction

Overview

Want to see a new scorer?

We're always adding new scorers to judgeval. If you have a suggestion, please let us know by opening a GitHub issue!

Scorers execute on Examples and Dataset, producing a numerical score.

Categories of Scorers

judgeval supports three implementations of scorers.

Default Scorers: plug-and-play scorers carefully crafted by our research team.

Custom Scorers: Powerful scorers that you can tailor to your own agent systems.

Classifier Scorers: A custom scorer that evaluates based on a prompt you provide.

Running Scorers

All scorers in judgeval can be run uniformly through the JudgmentClient. All scorers are set to run in async mode by default in order to support parallelized evaluations for large datasets.

run_scorer.py

from judgeval import JudgmentClient

example = ...  # your choice
scorer = ...

client = JudgmentClient()
results = client.run_evaluation(
    examples=[example],
    scorers=[scorer],
    model="gpt-4.1",
)

Introduction

Overview

Want to see a new scorer?

Categories of Scorers

Default Scorers

Custom Scorers

Classifier Scorers

Running Scorers

On this page