DatasetFactory
Create, retrieve, list, and delete datasets in your project.
Create, retrieve, list, and delete datasets in your project.
Access this via client.datasets -- you don't instantiate it directly.
Datasets are schema-enforced: every dataset has a JSON Schema and all examples are validated against it server-side.
# Create a dataset with an explicit schema
dataset = client.datasets.create(
name="golden-set",
schema={
"type": "object",
"properties": {
"input": {"type": "string"},
"expected_output": {"type": "string"},
},
},
examples=[
Example.create(input="What is AI?", expected_output="Artificial Intelligence"),
],
)
# Retrieve an existing dataset
dataset = client.datasets.get(name="golden-set")
# List all datasets
for info in client.datasets.list():
print(info.name, info.entries)__init__()
def __init__(client, project_id, project_name):Parameters
client
required:JudgmentSyncClient
project_id
required:Optional[str]
project_name
required:str
get()
Fetch an existing dataset with all its examples loaded.
dataset = client.datasets.get(name="golden-set")
print(len(dataset)) # number of examplesdef get(name) -> typing.Optional:Parameters
name
required:str
The dataset name (or dataset ID).
Returns
typing.Optional - A Dataset with examples populated, or None if the project
is not resolved.
create()
Create a new dataset with a JSON Schema, optionally with initial examples.
Every dataset requires a JSON Schema (type: "object"); examples
are validated against it server-side. If schema is omitted, a
schema is inferred from the provided examples as a convenience --
passing an explicit schema is recommended.
Every example in a dataset must contain every declared schema field -- one shape per dataset. When inferring from examples, all examples must have identical non-None field sets.
A column may be declared with {"type": "trace"} (any name); its
value is a trace id rather than literal data. Trace columns must be
declared in an explicit schema (inference treats values as their
JSON primitive). At most one trace column is allowed per dataset.
An explicit schema is checked client-side (validate_dataset_schema)
before the request so obvious mistakes fail fast; the server performs
the full JSON Schema validation.
ValueError: If neither schema nor examples are provided, or
if an explicit schema is structurally invalid.
JudgmentConflictError: If a dataset with this name exists and
overwrite is False.
JudgmentValidationError: If the schema is invalid, examples
fail validation, or overwrite is blocked by test configs.
dataset = client.datasets.create(
name="qa-pairs",
schema={
"type": "object",
"properties": {
"input": {"type": "string"},
"expected_output": {"type": "string"},
},
},
examples=[
Example.create(input="What is 2+2?", expected_output="4"),
],
)A dataset with a trace column (declared explicitly; the value is the trace id):
dataset = client.datasets.create(
name="transcripts",
schema={
"type": "object",
"properties": {"transcript": {"type": "trace"}},
},
examples=[
Example.create(transcript="<trace_id>"),
],
)def create(name, schema=None, examples=[], overwrite=False) -> typing.Optional:Parameters
name
required:str
Name for the dataset (unique within the project, case-sensitive).
schema
:Optional[DatasetSchema]
JSON Schema for the dataset's examples (a DatasetSchema
or a plain dict of the same shape). Required unless
examples are provided to infer one from.
None
examples
:Iterable[Example]
Examples to upload with the dataset.
[]
overwrite
:bool
Replace an existing dataset with the same name. Rejected by the server if the dataset has test configs.
False
Returns
typing.Optional - The new Dataset, or None if the project is not resolved.
list()
List all datasets in the project.
for info in client.datasets.list():
print(f"{info.name}: {info.entries} examples")def list() -> typing.Optional:Returns
typing.Optional - A list of DatasetInfo summaries, or None if the project
is not resolved.
versions()
List all versions of a dataset, newest first.
def versions(name) -> typing.Optional:Parameters
name
required:str
The dataset name (or dataset ID).
Returns
typing.Optional - A list of DatasetVersion objects, or None if the project
is not resolved.
delete()
Delete a dataset from the platform.
Dependent test configs are deleted along with the dataset.
def delete(name) -> bool:Parameters
name
required:str
The dataset name (or dataset ID).
Returns
bool - True if the dataset was deleted, False if the project is not
resolved.
Last updated on