Custom Scorers
Score your agent behavior using code and LLMs.
judgeval provides abstractions to implement custom scorers arbitrarily in code, enabling full flexibility in your scoring logic and use cases.
You can use any combination of code, custom LLMs as a judge, or library dependencies.
Your scorers can be automatically versioned and synced with the Judgment Platform to be run in production with zero latency impact.
Implement a CustomScorer
Inherit from the ExampleScorer class
from judgeval.scorers.example_scorer import ExampleScorer
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"ExampleScorer has the following attributes that you can access:
The name of your scorer to be displayed on the Judgment platform.
Additional metadata to be added to the scorer
Define your Custom Example Class
You can create your own custom Example class by inheriting from the base Example object. This allows you to configure any fields you want to score.
from judgeval.data import Example
class CustomerRequest(Example):
request: str
response: str
example = CustomerRequest(
request="Where is my package?",
response="Your package will arrive tomorrow at 10:00 AM.",
)Implement the a_score_example() method
The a_score_example() method takes an Example object and executes your scorer asynchronously to produce a float (between 0 and 1) score.
The only requirement for a_score_example() is that it:
- Take an
Exampleas an argument - Returns a
floatbetween 0 and 1
You can optionally set the self.reason attribute, depending on your preference.
This method is the core of your scorer, and you can implement it in any way you want.
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
# This is using the CustomerRequest class we defined in the previous step
async def a_score_example(self, example: CustomerRequest):
# Replace this logic with your own scoring logic
score = await scoring_function(example.request, example.response)
self.reason = justify_score(example.request, example.response, score)
return scoreImplementation Example
Here is a basic implementation of implementing a ExampleScorer.
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers.example_scorer import ExampleScorer
client = JudgmentClient()
class CustomerRequest(Example):
request: str
response: str
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
async def a_score_example(self, example: CustomerRequest):
# Replace this logic with your own scoring logic
if "package" in example.response:
self.reason = "The response contains the word 'package'"
return 1
else:
self.reason = "The response does not contain the word 'package'"
return 0
example = CustomerRequest(
request="Where is my package?",
response="Your package will arrive tomorrow at 10:00 AM."
)
res = client.run_evaluation(
examples=[example],
scorers=[ResolutionScorer()],
project_name="default_project",
)