Dataset
Overview
In most scenarios, you will have multiple Example
s that you want to evaluate together. Both judgeval
(Python) and judgeval-js
(TypeScript) provide an EvalDataset
class to manage collections of Example
s. These classes allow you to scale evaluations and offer similar functionalities for saving, loading, and synchronizing datasets with the Judgment platform.
Creating a Dataset
Creating an EvalDataset
is straightforward in both languages. You can initialize it with a list (Python) or array (TypeScript) of Example
s.
from judgeval.data import Example
from judgeval.data.datasets import EvalDataset
examples = [
Example(input="...", actual_output="..."),
Example(input="...", actual_output="..."),
...
]
dataset = EvalDataset(
examples=examples
)
You can also add Example
s to an existing EvalDataset
.
from judgeval.data import Example
dataset.add_example(Example(input="Question 3?", actual_output="Answer 3."))
Saving/Loading Datasets
Both libraries support saving and loading EvalDataset
objects locally and interacting with the Judgment Platform.
Local Formats:
- JSON
- CSV
- YAML
Remote:
- Judgment Platform
From Judgment Platform
You can push your local EvalDataset
to the Judgment platform or pull an existing one.
from judgeval import JudgmentClient
from judgeval.data.datasets import EvalDataset
client = JudgmentClient()
client.push_dataset(alias="my_dataset", dataset=dataset, project_name="my_project")
pulled_dataset = client.pull_dataset(alias="my_dataset", project_name="my_project")
From JSON
Your JSON file should have a top-level examples
key containing an array of example objects (using snake_case keys).
{
"examples": [
{
"input": "...",
"actual_output": "..."
},
...
]
}
Here's how to save/load from JSON.
from judgeval.data.datasets import EvalDataset
dataset = EvalDataset(...)
dataset.save_as("json", "/path/to/save/dir", "save_name")
# loading
new_dataset = EvalDataset()
new_dataset.add_from_json("/path/to/your/json/file.json")
From CSV
Your CSV should contain rows that can be mapped to Example
s via column names (typically snake_case). When loading, you'll need to provide a mapping from your Example
's camelCase field names to the CSV header names.
from judgeval.data.datasets import EvalDataset
dataset = EvalDataset(...)
dataset.save_as("csv", "/path/to/save/dir", "save_name")
# loading
new_dataset = EvalDataset()
new_dataset.add_from_csv("/path/to/your/csv/file.csv")
From YAML
Your YAML file should have a top-level examples
key containing a list of example objects (using snake_case keys).
examples:
- input: ...
actual_output: ...
expected_output: ...
from judgeval.data.datasets import EvalDataset
dataset = EvalDataset(...)
dataset.save_as("yaml", "/path/to/save/dir", "save_name")
new_dataset = EvalDataset()
new_dataset.add_from_yaml("/path/to/your/yaml/file.yaml")
Evaluate On Your Dataset / Examples
You can use the JudgmentClient
to evaluate a collection of Example
s using scorers. You can pass either an EvalDataset
object (Python) or an array of Example
objects (TypeScript) to the respective evaluation methods.
from judgeval import JudgmentClient
from judgeval.scorers import FaithfulnessScorer
res = client.run_evaluation(
examples=dataset.examples,
scorers=[FaithfulnessScorer(threshold=0.9)],
model="gpt-4.1",
)
Exporting Datasets
You can export your datasets from the Judgment Platform UI for backup purposes or sharing with team members.
Export from Platform UI
- Navigate to your project in the Judgment Platform
- Select the dataset you want to export
- Click the "Download Dataset" button in the top right
- The dataset will be downloaded as a JSON file
The exported JSON file contains the complete dataset information, including metadata and examples:
{
"dataset_id": "f852eeee-87fa-4430-9571-5784e693326e",
"organization_id": "0fbb0aa8-a7b3-4108-b92a-cc6c6800d825",
"dataset_alias": "QA-Pairs",
"comments": null,
"source_file": null,
"created_at": "2025-04-23T22:38:11.709763+00:00",
"is_sequence": false,
"examples": [
{
"example_id": "119ee1f6-1046-41bc-bb89-d9fc704829dd",
"input": "How can I start meditating?",
"actual_output": null,
"expected_output": "Meditation is a wonderful way to relax and focus...",
"context": null,
"retrieval_context": null,
"additional_metadata": {
"synthetic": true
},
"tools_called": null,
"expected_tools": null,
"name": null,
"created_at": "2025-04-23T23:34:33.117479+00:00",
"dataset_id": "f852eeee-87fa-4430-9571-5784e693326e",
"eval_results_id": null,
"sequence_id": null,
"sequence_order": 0
},
// more examples...
]
}
Conclusion
Congratulations! 🎉
You've now learned how to create, save, load, and evaluate datasets using judgeval
.