TraceCustomScorer
A custom scorer class for creating specialized evaluation logic for traces
A custom scorer class for creating specialized evaluation logic for traces. The scorer is generic and must be parameterized with a response type (BinaryResponse, CategoricalResponse, or NumericResponse).
See Trace for details on the Trace type and TraceSpan properties.
Generic Parameters
Rrequired
:BinaryResponse | CategoricalResponse | NumericResponseThe response type that the scorer returns. Determines the structure of the scoring result.
Methods
scorerequired
:async defMeasures the score on a trace. Must be implemented by subclasses. Returns a typed response based on the generic parameter.
async def score(self, data: Trace) -> NumericResponse:
# Custom scoring logic here
return NumericResponse(value=1.0, reason="...")Response Types
All response types include a value, reason, and optional citations field:
-
BinaryResponse: For true/false evaluations
value(bool): The boolean resultreason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
-
CategoricalResponse: For categorical evaluations
value(str): The category namereason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
-
NumericResponse: For numeric evaluations
value(float): The numeric scorereason(str): Explanation for the resultcitations(Optional[List[Citation]]): References to specific trace spans
Usage
from judgeval.v1.data import Trace
from judgeval.v1.hosted import TraceCustomScorer, NumericResponse
class ToolCallScorer(TraceCustomScorer[NumericResponse]):
async def score(self, data: Trace) -> NumericResponse:
tool_calls = [span for span in data if span["span_kind"] == "tool"]
return NumericResponse(
value=float(len(tool_calls)),
reason=f"Agent made {len(tool_calls)} tool call(s)."
)