Custom Scorers
We know every evaluation is unique, so we provide two types of Custom Scorers to meet your specific needs.
Implement a ExampleScorer
To implement your own scorer with more complex code logic, you must:
Inherit from the ExampleScorer
class
from judgeval.scorers.example_scorer import ExampleScorer
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
ExampleScorer has the following attributes that you can access:
Attribute | Type | Description | Default |
---|---|---|---|
name | str | The name of your scorer to be displayed on the Judgment platform. | "Custom" |
score | float | The score of the scorer. | N/A |
threshold | float | The threshold for the scorer. | 0.5 |
reason | str | A description for why the score was given. | N/A |
error | str | An error message if the scorer fails. | N/A |
additional_metadata | dict | Additional metadata to be added to the scorer | N/A |
Define your Custom Example Class
You can create your own custom Example class by inheriting from the base Example object. This allows you to configure any fields you want to score.
from judgeval.data import Example
class CustomerRequest(Example):
request: str
response: str
example = CustomerRequest(
request="Where is my package?",
response="Your package will arrive tomorrow at 10:00 AM.",
)
Implement the a_score_example()
method
The a_score_example()
method takes an Example
object and execute your scorer to produce a float
(between 0 and 1) score.
Optionally, you can include a reason to accompany the score if applicable (e.g. for LLM judge-based scorers).
These methods are the core of your scorer, and you can implement them in any way you want. Be creative!
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
# This is using the CustomerRequest class we defined in the previous step
async def a_score_example(self, example: CustomerRequest):
# Replace this logic with your own scoring logic
score = await scoring_function(example.request, example.response)
self.reason = justify_score(example.request, example.response, score)
return score
(Optional) Override the success_check()
method
By default, we determine a scorer to be successful if it has a score greater than or equal to the threshold and there were no errors.
If you wish to change this behavior, you can override the success_check()
method to your liking.
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
threshold: float = 1
def success_check(self):
return self.score < self.threshold
Implementation Example
Here is a basic implementation of implementing a ExampleScorer.
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers.example_scorer import ExampleScorer
client = JudgmentClient()
class CustomerRequest(Example):
request: str
response: str
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
async def a_score_example(self, example: CustomerRequest):
# Replace this logic with your own scoring logic
if "package" in example.response:
self.reason = "The response contains the word 'package'"
return 1
else:
self.reason = "The response does not contain the word 'package'"
return 0
example = CustomerRequest(request="Where is my package?", response="Your package will arrive tomorrow at 10:00 AM.")
res = client.run_evaluation(
examples=[example],
scorers=[ResolutionScorer()],
project_name="default_project",
)
Cookbooks
Code Style Scorers
Implement a scorer that evaluates the quality of code style, suitable for a PR review bot.
Cold Email Scorer
Implement a scorer that evaluates the quality of cold emails, suitable for a sales automation tool.