Datasets
Group examples and traces for scalable evaluation workflows.
Datasets group multiple examples or traces for scalable evaluation workflows. Use datasets to manage example collections, run batch evaluations, and sync your test data with the Judgment platform for team collaboration.

Manage using the SDK
You can create and manage datasets via the Python SDK, supporting functionality for creating, retrieving, adding examples, and exporting datasets.
Create using the Judgment Platform
Navigate to Datasets
Go to the Datasets tab in the sidebar.
Create your dataset
Click the New Dataset button and select the data type to store:
- Example datasets store key-value data pairs (e.g. input and output)
- Trace datasets store full trace data


Add data to the dataset
-
You can add data to
Exampledatasets from:Testpage (Example Testtype)
-
You can add data to
Tracedatasets from:Testpage (Trace Testtype)Tracespage inMonitoring
Exporting Datasets
You can export your datasets from the Judgment Platform UI for backup purposes, sharing with team members, or publishing to HuggingFace Hub.
Export to HuggingFace
You can export your datasets directly to HuggingFace Hub by configuring the HUGGINGFACE_ACCESS_TOKEN secret in your organization settings.
Steps to set up HuggingFace export:
- Navigate to your organization's [Settings > Secrets]
- Find the
HUGGINGFACE_ACCESS_TOKENsecret and click the edit icon


- Enter your HuggingFace access token
- Once configured, navigate to your dataset in the platform
- Click the "Export Dataset to HF" button in the top right to export your dataset to HuggingFace Hub


Next Steps
- SDK Reference - Complete API documentation for managing datasets programmatically
- Behaviors - Automatically tag traces based on agent behavior
- Custom Scorers - Create custom evaluation logic for your datasets