Data Types Reference
Complete reference for all data types used in the JudgmentEval SDK
Overview
The JudgmentEval SDK uses a well-defined set of data types to ensure consistency across all components. This section provides comprehensive documentation for all types you'll encounter when working with evaluations, datasets, tracing, and scoring.
Quick Reference
Type Category | Key Types | Primary Use Cases |
---|---|---|
Core Types | Example , Trace , ExampleScorer | Dataset creation, evaluation runs, tracing |
Configuration Types | APIScorerConfig , BaseScorer | Setting up scorers and SDK components |
Response Types | EvaluationResult , JudgmentAPIError | Handling results and errors |
Type Categories
Core Data Types
Essential objects that represent the fundamental concepts in JudgmentEval:
- Example - Input/output pairs for evaluation
- Trace - Execution traces from AI agent runs
- ExampleScorer - Pairing of examples with scoring methods
Configuration Types
Objects used to configure SDK behavior and customize evaluation:
- APIScorerConfig - Configuration for API-based scorers
- BaseScorer - Base class for custom scoring logic
- Utility Types - Common configuration patterns
Response & Exception Types
Types returned by SDK methods and exceptions that may be raised:
- JudgmentAPIError - Primary SDK exception type
- EvaluationResult - Results from evaluation runs
- DatasetInfo - Dataset operation results
Common Usage Patterns
Creating Examples
from judgeval import Example
# Basic example
example = Example(
input="What is the capital of France?",
expected_output="Paris"
)
# With metadata
example_with_context = Example(
input="Explain machine learning",
expected_output="Machine learning is...",
metadata={"topic": "AI", "difficulty": "intermediate"}
)
Configuring Scorers
from judgeval.scorers import APIScorerConfig, PromptScorer
# API-based scorer
api_config = APIScorerConfig(
name="accuracy_checker",
prompt="Rate accuracy from 1-5"
)
# Custom scorer instance
custom_scorer = PromptScorer(
name="custom_evaluator",
prompt="Evaluate response quality..."
)
Handling Results
from judgeval import JudgmentClient, JudgmentAPIError
try:
result = client.evaluate(examples=[...], scorers=[...])
print(f"Average score: {result.aggregate_scores['mean']}")
for example_result in result.results:
print(f"Score: {example_result.score}")
except JudgmentAPIError as e:
print(f"Evaluation failed: {e.message}")
Type Import Reference
Most types can be imported directly from the main package:
# Core types
from judgeval import Example, ExampleScorer
# Scorer configurations
from judgeval.scorers import APIScorerConfig, BaseScorer, PromptScorer
# Client and exceptions
from judgeval import JudgmentClient, JudgmentAPIError
# Dataset operations
from judgeval import Dataset
Next Steps
- Explore Core Types to understand fundamental SDK objects
- Review Configuration Types for customizing SDK behavior
- Check Response Types for proper error handling
For practical examples, see the individual SDK component documentation:
- Tracer - For tracing and observability
- Dataset - For dataset management
- JudgmentClient - For evaluation operations