The JudgevalScorer is the abstraction for custom evaluation logic. Whether your evaluation logic is a simple algorithm, LLM-judge, or a multi-agent system, you can use a JudgevalScorer for evaluation.

Implement a `JudgevalScorer`

To implement your own custom scorer, you must:

Inherit from the `JudgevalScorer` class and name your scorer

This will help judgeval integrate your scorer into evaluation runs.

custom_scorer.py

from judgeval.scorers import JudgevalScorer

class SampleScorer(JudgevalScorer):
    ...
    @property
    def __name__(self):
        return "Sample Scorer"

Implement the `init()` method

JudgevalScorers have some required attributes that must be determined in the __init__() method. For instance, you must set a threshold to determine what constitutes success/failure for a scorer.

There are additional optional attributes that can be set here for even more flexibility:

Attribute	Type	Description
score_type	str	The name of your scorer. This will be displayed in the Judgment platform.
include_reason	bool	Whether your scorer includes a reason for the score in the results. Only for LLM judge-based scorers.
async_mode	bool	Whether your scorer should be run asynchronously during evaluations.
strict_mode	bool	Whether your scorer fails if the score is not perfect (1.0).
verbose_mode	bool	Whether your scorer produces verbose logs.
custom_example	bool	Whether your scorer should be run on custom examples.

custom_scorer.py

class SampleScorer(JudgevalScorer):
    def __init__(
        self,
        threshold=0.5,
        score_type="Sample Scorer",
        include_reason=True,
        async_mode=True,
        strict_mode=False,
        verbose_mode=True
    ):
        super().__init__(score_type=score_type, threshold=threshold)
        self.threshold = 1 if strict_mode else threshold
        # Optional attributes
        self.include_reason = include_reason
        self.async_mode = async_mode
        self.strict_mode = strict_mode
        self.verbose_mode = verbose_mode

Implement the `score_example()` and `a_score_example()` methods

The score_example() and a_score_example() methods take an Example object and execute your scorer to produce a float (between 0 and 1) score. Optionally, you can include a reason to accompany the score if applicable (e.g. for LLM judge-based scorers).

The only requirement for score_example() and a_score_example() is that they:

Take an Example as an argument (you can add other arguments too)
Set the self.score attribute
Set the self.success attribute

You can optionally set the self.reason attribute, depending on your preference.

a_score_example() is simply the async version of score_example(), so the implementation should largely be identical.

These methods are the core of your scorer, and you can implement them in any way you want. Be creative!

Handling Errors

If you want to handle errors gracefully, you can use a try block and in the except block, set the self.error attribute to the error message. This will allow judgeval to catch the error but still execute the rest of an evaluation run, assuming you have multiple examples to evaluate.

Here's a sample implementation that integrates everything we've covered:

custom_scorer.py

class SampleScorer(JudgevalScorer):
    ...

    def score_example(self, example, ...):
        try:
            self.score = run_scorer_logic(example)
            if self.include_reason:
                self.reason = justify_score(example, self.score)
            if self.verbose_mode:
                self.verbose_logs = make_logs(example, self.reason, self.score)
            self.success = self.score >= self.threshold
        except Exception as e:
            self.error = str(e)
            self.success = False
    
    async def a_score_example(self, example, ...):
        try:
            self.score = await a_run_scorer_logic(example)  # async version
            if self.include_reason:
                self.reason = justify_score(example, self.score)
            if self.verbose_mode:
                self.verbose_logs = make_logs(example, self.reason, self.score)
            self.success = self.score >= self.threshold
        except Exception as e:
            self.error = str(e)
            self.success = False

Implement the `_success_check()` method

When executing an evaluation run, judgeval will check if your scorer has passed the _success_check() method.

You can implement this method in any way you want, but it should return a bool. Here's a perfectly valid implementation:

custom_scorer.py

class SampleScorer(JudgevalScorer):
    ...

    def _success_check(self):
        if self.error is not None:
            return False
        return self.score >= self.threshold  # or you can do self.success if set

Congratulations! 🎉

You've made your first custom judgeval scorer! Now that your scorer is implemented, you can run it on your own datasets just like any other judgeval scorer. Your scorer is fully integrated with judgeval's infrastructure so you can view it on the Judgment platform too.

Cookbooks

Code Style Scorers

Implement a scorer that evaluates the quality of code style, suitable for a PR review bot.

Cold Email Scorer

Implement a scorer that evaluates the quality of cold emails, suitable for a sales automation tool.

Custom Scorers

Implement a `JudgevalScorer`

Inherit from the `JudgevalScorer` class and name your scorer

Implement the `init()` method

Implement the `score_example()` and `a_score_example()` methods

Handling Errors

Implement the `_success_check()` method

Cookbooks

Code Style Scorers

Cold Email Scorer

On this page

Custom Scorers

Implement a JudgevalScorer

Inherit from the JudgevalScorer class and name your scorer

Implement the __init__() method

Implement the score_example() and a_score_example() methods

Handling Errors

Implement the _success_check() method

Cookbooks

Code Style Scorers

Cold Email Scorer

On this page

Implement a `JudgevalScorer`

Inherit from the `JudgevalScorer` class and name your scorer

Implement the `init()` method

Implement the `score_example()` and `a_score_example()` methods

Implement the `_success_check()` method