Judgeval Python-v1 SDKPrimitives

TraceCustomScorer

A custom scorer class for creating specialized evaluation logic for traces

A custom scorer class for creating specialized evaluation logic for traces. The scorer is generic and must be parameterized with a response type (BinaryResponse, CategoricalResponse, or NumericResponse).

See Trace for details on the Trace type and TraceSpan properties.

Generic Parameters

Rrequired

:BinaryResponse | CategoricalResponse | NumericResponse

The response type that the scorer returns. Determines the structure of the scoring result.

Methods

scorerequired

:async def

Measures the score on a trace. Must be implemented by subclasses. Returns a typed response based on the generic parameter.

score.py
async def score(self, data: Trace) -> NumericResponse:
    # Custom scoring logic here
    return NumericResponse(value=1.0, reason="...")

Response Types

All response types include a value, reason, and optional citations field:

  • BinaryResponse: For true/false evaluations

    • value (bool): The boolean result
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans
  • CategoricalResponse: For categorical evaluations

    • value (str): The category name
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans
  • NumericResponse: For numeric evaluations

    • value (float): The numeric score
    • reason (str): Explanation for the result
    • citations (Optional[List[Citation]]): References to specific trace spans

Usage

from judgeval.v1.data import Trace
from judgeval.v1.hosted import TraceCustomScorer, NumericResponse

class ToolCallScorer(TraceCustomScorer[NumericResponse]):
    async def score(self, data: Trace) -> NumericResponse:
        tool_calls = [span for span in data if span["span_kind"] == "tool"]

        return NumericResponse(
            value=float(len(tool_calls)),
            reason=f"Agent made {len(tool_calls)} tool call(s)."
        )