Judgeval Python-v1 SDKResponse Types

ScoringResult

Complete evaluation results for a single example

Contains the output of one or more scorers applied to a single example. Represents the complete evaluation results for one input with its actual output, expected output, and all applied scorer results.

successrequired

:bool
Whether the evaluation was successful. True when all scorers applied to this example returned a success.

scorers_datarequired

:List[ScorerData]

List of individual scorer results for this evaluation

name

:Optional[str]

Optional name identifier for this scoring result

data_object

:Optional[Union[OtelTraceSpan, Example]]

The original example or trace object that was evaluated

trace_id

:Optional[str]

Unique identifier linking this result to trace data

run_duration

:Optional[float]

Time taken to complete the evaluation in seconds

evaluation_cost

:Optional[float]
Estimated cost of running the evaluation (e.g., API costs)

Methods

to_dict()

:Dict[str, Any]

Convert the scoring result to a dictionary format for API serialization

Usage Examples

from judgeval import Judgeval
from judgeval.v1.data.example import Example

client = Judgeval(project_name="default_project")

# Run an evaluation
examples = [
    Example.create(
        input="What is the capital of France?",
        expected_output="Paris",
        actual_output="Paris is the capital city of France."
    )
]

results = client.evaluation.create().run(
    examples=examples,
    scorers=[client.scorers.built_in.answer_relevancy()]
)

# Process results
for result in results:
    if result.success:
        print(f"Evaluation succeeded in {result.run_duration:.2f}s")
        if result.evaluation_cost:
            print(f"Cost: ${result.evaluation_cost:.4f}")
        
        for scorer_data in result.scorers_data:
            print(f"  {scorer_data.name}: {scorer_data.score}")
    else:
        print("Evaluation failed")
        for scorer_data in result.scorers_data:
            if not scorer_data.success:
                print(f"  {scorer_data.name} failed: {scorer_data.error or 'Score below threshold'}")
    
    # Access the original example
    if result.data_object:
        print(f"Evaluated example: {result.data_object}")
    
    # Access trace ID if available
    if result.trace_id:
        print(f"Trace ID: {result.trace_id}")