Datasets

Datasets group multiple examples or traces for scalable evaluation workflows. Use the Dataset class to manage example collections, run batch evaluations, and sync your test data with the Judgment platform for team collaboration.

Quickstart

You can use the JudgmentClient to evaluate a collection of Examples using scorers.

evaluate_dataset.py

from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers.example_scorer import ExampleScorer
from judgeval.dataset import Dataset

client = JudgmentClient()

class CustomerRequest(Example):
    request: str
    response: str

class ResolutionScorer(ExampleScorer):
    name: str = "Resolution Scorer"

    async def a_score_example(self, example: CustomerRequest):
        # Replace this logic with your own scoring logic
        if "package" in example.response:
            self.reason = "The response contains the word 'package'"
            return 1
        else:
            self.reason = "The response does not contain the word 'package'"
            return 0


examples = [
    CustomerRequest(request="Where is my package?", response="Your P*CKAG* will arrive tomorrow at 10:00 AM."), # failing example
    CustomerRequest(request="Where is my package?", response="Your package will arrive tomorrow at 10:00 AM.")  # passing example
]

# Create dataset which is automatically saved to Judgment platform
Dataset.create(name="my_dataset", project_name="default_project", examples=examples) 

# Fetch dataset from Judgment platform
dataset = Dataset.get(name="my_dataset", project_name="default_project") 

res = client.run_evaluation(
    examples=dataset.examples, 
    scorers=[ResolutionScorer()],
    project_name="default_project"
)

Creating a Dataset

Datasets can be created by passing a list of examples to the Dataset constructor.

dataset.py

from judgeval.data import Example
from judgeval.dataset import Dataset

class CustomerRequest(Example):
    request: str
    response: str

examples = [
    CustomerRequest(request="Where is my package?", response="Your P*CKAG* will arrive tomorrow at 10:00 AM.")
]

dataset = Dataset.create(name="my_dataset", project_name="default_project", examples=examples)

You can also add Examples to an existing Dataset.

new_examples = [CustomerRequest(request="Where is my package?", response="Your package will arrive tomorrow at 10:00 AM.")]

dataset.add_examples(new_examples)

We automatically save your Dataset to the Judgment Platform when you create it and when you append to it.

Loading a Dataset

From the Platform

Retrieve datasets you've already saved to the Judgment platform:

load_from_platform.py

from judgeval.dataset import Dataset

# Get an existing dataset
dataset = Dataset.get(name="my_dataset", project_name="default_project")

From Local Files

Import datasets from JSON or YAML files on your local machine:

Your JSON file should contain an array of example objects:

examples.json

[
    {
        "input": "Where is my package?",
        "actual_output": "Your package will arrive tomorrow."
    },
    {
        "input": "How do I return an item?",
        "actual_output": "You can return items within 30 days."
    }
]

Load the JSON file into a dataset:

load_json.py

from judgeval.dataset import Dataset

# Create new dataset and add examples from JSON
dataset = Dataset.create(name="my_dataset", project_name="default_project")
dataset.add_from_json("/path/to/examples.json")

Your YAML file should contain a list of example objects:

examples.yaml

- input: "Where is my package?"
  actual_output: "Your package will arrive tomorrow."
  expected_output: "Your package will arrive tomorrow at 10:00 AM."

- input: "How do I return an item?"
  actual_output: "You can return items within 30 days."
  expected_output: "You can return items within 30 days of purchase."

Load the YAML file into a dataset:

load_yaml.py

from judgeval.dataset import Dataset

# Create new dataset and add examples from YAML
dataset = Dataset.create(name="my_dataset", project_name="default_project")
dataset.add_from_yaml("/path/to/examples.yaml")

Saving Datasets to Local Files

Export your datasets to local files for backup or sharing:

export_dataset.py

from judgeval.dataset import Dataset

dataset = Dataset.get(name="my_dataset", project_name="default_project")

# Save as JSON
dataset.save_as("json", "/path/to/save/dir", "my_dataset")

# Save as YAML
dataset.save_as("yaml", "/path/to/save/dir", "my_dataset")

Exporting Datasets

You can export your datasets from the Judgment Platform UI for backup purposes, sharing with team members, or publishing to HuggingFace Hub.

Export to HuggingFace

You can export your datasets directly to HuggingFace Hub by configuring the HUGGINGFACE_ACCESS_TOKEN secret in your organization settings.

Steps to set up HuggingFace export:

Navigate to your organization's [Settings > Secrets]
Find the HUGGINGFACE_ACCESS_TOKEN secret and click the edit icon

HuggingFace Token Configuration

Enter your HuggingFace access token
Once configured, navigate to your dataset in the platform
Click the "Export Dataset to HF" button in the top right to export your dataset to HuggingFace Hub

Export Dataset to HuggingFace

You can generate a HuggingFace access token from your HuggingFace settings. Make sure the token has write permissions to create and update datasets.

Datasets

On this page