Dataset

Overview

In most scenarios, you will have multiple Examples that you want to evaluate together. Both judgeval (Python) and judgeval-js (TypeScript) provide an EvalDataset class to manage collections of Examples. These classes allow you to scale evaluations and offer similar functionalities for saving, loading, and synchronizing datasets with the Judgment platform.

Creating a Dataset

Creating an EvalDataset is straightforward in both languages. You can initialize it with a list (Python) or array (TypeScript) of Examples.

from judgeval.data import Example
from judgeval.data.datasets import EvalDataset

examples = [
    Example(input="...", actual_output="..."), 
    Example(input="...", actual_output="..."), 
    ...
]

dataset = EvalDataset(
    examples=examples
)

You can also add Examples to an existing EvalDataset.

from judgeval.data import Example

dataset.add_example(Example(input="Question 3?", actual_output="Answer 3."))

Saving/Loading Datasets

Both libraries support saving and loading EvalDataset objects locally and interacting with the Judgment Platform.

Local Formats:

JSON
CSV
YAML

Remote:

Judgment Platform

From Judgment Platform

You can push your local EvalDataset to the Judgment platform or pull an existing one.

  from judgeval import JudgmentClient
  from judgeval.data.datasets import EvalDataset

  client = JudgmentClient()
  client.push_dataset(alias="my_dataset", dataset=dataset, project_name="my_project")
  pulled_dataset = client.pull_dataset(alias="my_dataset", project_name="my_project")

From JSON

Your JSON file should have a top-level examples key containing an array of example objects (using snake_case keys).

{
    "examples": [
        {
            "input": "...", 
            "actual_output": "..."
        }, 
        ...
    ]
}

Here's how to save/load from JSON.

  from judgeval.data.datasets import EvalDataset

  dataset = EvalDataset(...)
  dataset.save_as("json", "/path/to/save/dir", "save_name")

  # loading
  new_dataset = EvalDataset()
  new_dataset.add_from_json("/path/to/your/json/file.json")

From CSV

Your CSV should contain rows that can be mapped to Examples via column names (typically snake_case). When loading, you'll need to provide a mapping from your Example's camelCase field names to the CSV header names.

from judgeval.data.datasets import EvalDataset

dataset = EvalDataset(...)
dataset.save_as("csv", "/path/to/save/dir", "save_name")

# loading
new_dataset = EvalDataset()
new_dataset.add_from_csv("/path/to/your/csv/file.csv")

From YAML

Your YAML file should have a top-level examples key containing a list of example objects (using snake_case keys).

examples:
  - input: ...
    actual_output: ...
    expected_output: ...

from judgeval.data.datasets import EvalDataset

dataset = EvalDataset(...)
dataset.save_as("yaml", "/path/to/save/dir", "save_name")

new_dataset = EvalDataset()
new_dataset.add_from_yaml("/path/to/your/yaml/file.yaml")

Evaluate On Your Dataset / Examples

You can use the JudgmentClient to evaluate a collection of Examples using scorers. You can pass either an EvalDataset object (Python) or an array of Example objects (TypeScript) to the respective evaluation methods.

from judgeval import JudgmentClient
from judgeval.scorers import FaithfulnessScorer

res = client.run_evaluation(
    examples=dataset.examples,
    scorers=[FaithfulnessScorer(threshold=0.9)],
    model="gpt-4.1",
)

Exporting Datasets

You can export your datasets from the Judgment Platform UI for backup purposes or sharing with team members.

Export from Platform UI

Navigate to your project in the Judgment Platform
Select the dataset you want to export
Click the "Download Dataset" button in the top right
The dataset will be downloaded as a JSON file

Export Dataset

The exported JSON file contains the complete dataset information, including metadata and examples:

{
  "dataset_id": "f852eeee-87fa-4430-9571-5784e693326e",
  "organization_id": "0fbb0aa8-a7b3-4108-b92a-cc6c6800d825",
  "dataset_alias": "QA-Pairs",
  "comments": null,
  "source_file": null,
  "created_at": "2025-04-23T22:38:11.709763+00:00",
  "is_sequence": false,
  "examples": [
    {
      "example_id": "119ee1f6-1046-41bc-bb89-d9fc704829dd",
      "input": "How can I start meditating?",
      "actual_output": null,
      "expected_output": "Meditation is a wonderful way to relax and focus...",
      "context": null,
      "retrieval_context": null,
      "additional_metadata": {
        "synthetic": true
      },
      "tools_called": null,
      "expected_tools": null,
      "name": null,
      "created_at": "2025-04-23T23:34:33.117479+00:00",
      "dataset_id": "f852eeee-87fa-4430-9571-5784e693326e",
      "eval_results_id": null,
      "sequence_id": null,
      "sequence_order": 0
    },
    // more examples...
  ]
}

Conclusion

Congratulations! 🎉

You've now learned how to create, save, load, and evaluate datasets using judgeval.

Dataset

On this page