Judgeval Python-v1 SDKPrimitives

ExampleCustomScorer

A custom scorer class for creating specialized evaluation logic for examples

A custom scorer class for creating specialized evaluation logic for individual examples. The scorer is generic and must be parameterized with a response type (BinaryResponse, CategoricalResponse, or NumericResponse).

See Example for more details on the Example type.

Generic Parameters

Rrequired

:BinaryResponse | CategoricalResponse | NumericResponse

The response type that the scorer returns. Determines the structure of the scoring result.

Methods

scorerequired

:async def

Measures the score on a single example. Must be implemented by subclasses. Returns a typed response based on the generic parameter.

score.py
async def score(self, data: Example) -> BinaryResponse:
    # Custom scoring logic here
    return BinaryResponse(value=True, reason="...")

Response Types

All response types include a value, reason, and optional citations field:

  • BinaryResponse: For true/false evaluations

    • value (bool): The boolean result
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans
  • CategoricalResponse: For categorical evaluations

    • value (str): The category name
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans
  • NumericResponse: For numeric evaluations

    • value (float): The numeric score
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans

Usage

from judgeval.v1.data import Example
from judgeval.v1.hosted import ExampleCustomScorer, BinaryResponse

class CorrectnessScorer(ExampleCustomScorer[BinaryResponse]):
    async def score(self, data: Example) -> BinaryResponse:
        actual_output = data.get_property("actual_output")
        expected_output = data.get_property("expected_output")

        if expected_output in actual_output:
            return BinaryResponse(
                value=True,
                reason="The answer contains the expected output."
            )

        return BinaryResponse(
            value=False,
            reason="The answer does not contain the expected output."
        )