Datasets
Datasets group multiple examples for scalable evaluation workflows. Use the Dataset
class to manage example collections, run batch evaluations, and sync your test data with the Judgment platform for team collaboration.
Quickstart
You can use the JudgmentClient
to evaluate a collection of Example
s using scorers.
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers.example_scorer import ExampleScorer
from judgeval.dataset import Dataset
client = JudgmentClient()
class CustomerRequest(Example):
request: str
response: str
class ResolutionScorer(ExampleScorer):
name: str = "Resolution Scorer"
async def a_score_example(self, example: CustomerRequest):
# Replace this logic with your own scoring logic
if "package" in example.response:
self.reason = "The response contains the word 'package'"
return 1
else:
self.reason = "The response does not contain the word 'package'"
return 0
examples = [
CustomerRequest(request="Where is my package?", response="Your P*CKAG* will arrive tomorrow at 10:00 AM."), # failing example
CustomerRequest(request="Where is my package?", response="Your package will arrive tomorrow at 10:00 AM.") # passing example
]
# Create dataset which is automatically saved to Judgment platform
Dataset.create(name="my_dataset", project_name="default_project", examples=examples)
# Fetch dataset from Judgment platform
dataset = Dataset.get(name="my_dataset", project_name="default_project")
res = client.run_evaluation(
examples=dataset.examples,
scorers=[ResolutionScorer()],
project_name="default_project"
)
Creating a Dataset
Datasets can be created by passing a list of examples to the Dataset
constructor.
from judgeval.data import Example
from judgeval.dataset import Dataset
class CustomerRequest(Example):
request: str
response: str
examples = [
CustomerRequest(request="Where is my package?", response="Your P*CKAG* will arrive tomorrow at 10:00 AM.")
]
dataset = Dataset.create(name="my_dataset", project_name="default_project", examples=examples)
You can also add Example
s to an existing Dataset
.
new_examples = [CustomerRequest(request="Where is my package?", response="Your package will arrive tomorrow at 10:00 AM.")]
dataset.add_examples(new_examples)
Loading a Dataset
From the Platform
Retrieve datasets you've already saved to the Judgment platform:
from judgeval.dataset import Dataset
# Get an existing dataset
dataset = Dataset.get(name="my_dataset", project_name="default_project")
From Local Files
Import datasets from JSON or YAML files on your local machine:
Your JSON file should contain an array of example objects:
[
{
"input": "Where is my package?",
"actual_output": "Your package will arrive tomorrow."
},
{
"input": "How do I return an item?",
"actual_output": "You can return items within 30 days."
}
]
Load the JSON file into a dataset:
from judgeval.dataset import Dataset
# Create new dataset and add examples from JSON
dataset = Dataset.create(name="my_dataset", project_name="default_project")
dataset.add_from_json("/path/to/examples.json")
Your YAML file should contain a list of example objects:
- input: "Where is my package?"
actual_output: "Your package will arrive tomorrow."
expected_output: "Your package will arrive tomorrow at 10:00 AM."
- input: "How do I return an item?"
actual_output: "You can return items within 30 days."
expected_output: "You can return items within 30 days of purchase."
Load the YAML file into a dataset:
from judgeval.dataset import Dataset
# Create new dataset and add examples from YAML
dataset = Dataset.create(name="my_dataset", project_name="default_project")
dataset.add_from_yaml("/path/to/examples.yaml")
Saving Datasets to Local Files
Export your datasets to local files for backup or sharing:
from judgeval.dataset import Dataset
dataset = Dataset.get(name="my_dataset", project_name="default_project")
# Save as JSON
dataset.save_as("json", "/path/to/save/dir", "my_dataset")
# Save as YAML
dataset.save_as("yaml", "/path/to/save/dir", "my_dataset")