TraceCustomScorer

A custom scorer class for creating specialized evaluation logic for traces. The scorer is generic and must be parameterized with a response type (BinaryResponse, CategoricalResponse, or NumericResponse).

See Trace for details on the Trace type and TraceSpan properties.

Generic Parameters

Rrequired

:BinaryResponse | CategoricalResponse | NumericResponse

The response type that the scorer returns. Determines the structure of the scoring result.

Methods

scorerequired

:async def

Measures the score on a trace. Must be implemented by subclasses. Returns a typed response based on the generic parameter.

score.py

async def score(self, data: Trace) -> NumericResponse:
    # Custom scoring logic here
    return NumericResponse(value=1.0, reason="...")

Response Types

All response types include a value, reason, and optional citations field:

BinaryResponse: For true/false evaluations
- value (bool): The boolean result
- reason (str): Explanation for the result
- citations (Optional[List[Citation]]): References to specific trace spans
CategoricalResponse: For categorical evaluations
- value (str): The category name
- reason (str): Explanation for the result
- citations (Optional[List[Citation]]): References to specific trace spans
NumericResponse: For numeric evaluations
- value (float): The numeric score
- reason (str): Explanation for the result
- citations (Optional[List[Citation]]): References to specific trace spans

Usage

from judgeval.v1.data import Trace
from judgeval.v1.hosted import TraceCustomScorer, NumericResponse

class ToolCallScorer(TraceCustomScorer[NumericResponse]):
    async def score(self, data: Trace) -> NumericResponse:
        tool_calls = [span for span in data if span["span_kind"] == "tool"]

        return NumericResponse(
            value=float(len(tool_calls)),
            reason=f"Agent made {len(tool_calls)} tool call(s)."
        )