Data Types Reference

Overview

The JudgmentEval SDK uses a well-defined set of data types to ensure consistency across all components. This section provides comprehensive documentation for all types you'll encounter when working with evaluations, datasets, tracing, and scoring.

Quick Reference

Type Category	Key Types	Primary Use Cases
Core Types	`Example`, `Trace`, `ExampleScorer`	Dataset creation, evaluation runs, tracing
Configuration Types	`APIScorerConfig`, `BaseScorer`	Setting up scorers and SDK components
Response Types	`EvaluationResult`, `JudgmentAPIError`	Handling results and errors

Type Categories

Core Data Types

Essential objects that represent the fundamental concepts in JudgmentEval:

Example - Input/output pairs for evaluation
Trace - Execution traces from AI agent runs
ExampleScorer - Pairing of examples with scoring methods

Configuration Types

Objects used to configure SDK behavior and customize evaluation:

APIScorerConfig - Configuration for API-based scorers
BaseScorer - Base class for custom scoring logic
Utility Types - Common configuration patterns

Response & Exception Types

Types returned by SDK methods and exceptions that may be raised:

JudgmentAPIError - Primary SDK exception type
EvaluationResult - Results from evaluation runs
DatasetInfo - Dataset operation results

Common Usage Patterns

Creating Examples

from judgeval import Example

# Basic example
example = Example(
    input="What is the capital of France?",
    expected_output="Paris"
)

# With metadata
example_with_context = Example(
    input="Explain machine learning",
    expected_output="Machine learning is...",
    metadata={"topic": "AI", "difficulty": "intermediate"}
)

Configuring Scorers

from judgeval.scorers import APIScorerConfig, PromptScorer

# API-based scorer
api_config = APIScorerConfig(
    name="accuracy_checker", 
    prompt="Rate accuracy from 1-5"
)

# Custom scorer instance  
custom_scorer = PromptScorer(
    name="custom_evaluator",
    prompt="Evaluate response quality..."
)

Handling Results

from judgeval import JudgmentClient, JudgmentAPIError

try:
    result = client.evaluate(examples=[...], scorers=[...])
    print(f"Average score: {result.aggregate_scores['mean']}")
    
    for example_result in result.results:
        print(f"Score: {example_result.score}")
        
except JudgmentAPIError as e:
    print(f"Evaluation failed: {e.message}")

Type Import Reference

Most types can be imported directly from the main package:

# Core types
from judgeval import Example, ExampleScorer

# Scorer configurations  
from judgeval.scorers import APIScorerConfig, BaseScorer, PromptScorer

# Client and exceptions
from judgeval import JudgmentClient, JudgmentAPIError

# Dataset operations
from judgeval import Dataset

Next Steps

Explore Core Types to understand fundamental SDK objects
Review Configuration Types for customizing SDK behavior
Check Response Types for proper error handling

For practical examples, see the individual SDK component documentation:

Tracer - For tracing and observability
Dataset - For dataset management
JudgmentClient - For evaluation operations

Data Types Reference

On this page