ExampleCustomScorer
A custom scorer class for creating specialized evaluation logic for examples
A custom scorer class for creating specialized evaluation logic for individual examples. The scorer is generic and must be parameterized with a response type (BinaryResponse, CategoricalResponse, or NumericResponse).
See Example for more details on the Example type.
Generic Parameters
Rrequired
:BinaryResponse | CategoricalResponse | NumericResponseThe response type that the scorer returns. Determines the structure of the scoring result.
Methods
scorerequired
:async defMeasures the score on a single example. Must be implemented by subclasses. Returns a typed response based on the generic parameter.
async def score(self, data: Example) -> BinaryResponse:
# Custom scoring logic here
return BinaryResponse(value=True, reason="...")Response Types
All response types include a value, reason, and optional citations field:
-
BinaryResponse: For true/false evaluations
value(bool): The boolean resultreason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
-
CategoricalResponse: For categorical evaluations
value(str): The category namereason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
-
NumericResponse: For numeric evaluations
value(float): The numeric scorereason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
Usage
from judgeval.v1.data import Example
from judgeval.v1.hosted import ExampleCustomScorer, BinaryResponse
class CorrectnessScorer(ExampleCustomScorer[BinaryResponse]):
async def score(self, data: Example) -> BinaryResponse:
actual_output = data.get_property("actual_output")
expected_output = data.get_property("expected_output")
if expected_output in actual_output:
return BinaryResponse(
value=True,
reason="The answer contains the expected output."
)
return BinaryResponse(
value=False,
reason="The answer does not contain the expected output."
)