Example

An Example is a basic unit of data in judgeval that allows you to run evaluation scorers on your agents.

In general, an Example corresponds to a single span in an agent trace.

In the context of unit testing, an Example corresponds to a single test case.

An Example can be composed of a mixture of the following fields:

Field	Type	Description
`input`	`Optional[Union[str, Dict[str, Any]]]`	Sample input to your agent/task
`actual_output`	`Optional[Union[str, List[str]]]`	What your agent outputs based on the input
`expected_output`	`Optional[Union[str, List[str]]]`	The ideal output of your agent
`retrieval_context`	`Optional[List[str]]`	Context retrieved from a vector database
`context`	`Optional[List[str]]`	Ground truth information supplied to the agent
`tools_called`	`Optional[List[str]]`	Tools that your agent actually invoked
`expected_tools`	`Optional[List[str]]`	Tools you expect your agent to use
`additional_metadata`	`Optional[Dict[str, Any]]`	Extra information attached to the example

Here's a sample of creating an Example:

example.py

from judgeval.data import Example

example = Example(
    input="Who founded Microsoft?",
    actual_output="Bill Gates and Paul Allen.",
    expected_output="Bill Gates and Paul Allen founded Microsoft in New Mexico in 1975.",
    retrieval_context=["Bill Gates co-founded Microsoft with Paul Allen in 1975."],
    context=["Bill Gates and Paul Allen are the founders of Microsoft."],
    tools_called=["research_person", "research_company"],
    expected_tools=["research_person", "research_company"],
    additional_metadata={"research_source": "Wikipedia"}
)

custom_example.py

from judgeval.data import Example

example = Example(
    input={  # can be a dictionary if your agent input is a JSON
        "question": "Who founded Microsoft?", 
        "tool_preferences": ["research_person", "research_company"]
    },
    actual_output="Bill Gates and Paul Allen.",
    expected_output="Bill Gates and Paul Allen founded Microsoft in New Mexico in 1975.",
    retrieval_context=["Bill Gates co-founded Microsoft with Paul Allen in 1975."],
    context=["Bill Gates and Paul Allen are the founders of Microsoft."],
    tools_called=["research_person", "research_company"],
    expected_tools=["research_person", "research_company"],
    additional_metadata={"research_source": "Wikipedia"}
)

It is often most useful to create an Example from the direct outputs of your agent system:

from judgeval.data import Example

input_q = "Who founded Microsoft?"
example = Example(
    input=input_q,
    actual_output=agent.research(input_q),
)

This enables you to use the Example as a test case for your agent system in your CI pipeline.

Example Fields

Learn More

Example objects are the building blocks of evals in judgeval. However, they are rarely used in isolation.

Examples are most powerful when combined to form datasets, which can be used to scale evaluations and testing.

Learn about Datasets

Combine multiple examples into datasets for scaled evaluations and testing

Example

Example Fields

Input

Actual Output

Expected Output

Retrieval Context

Context

Tools Called

Expected Tools

Additional Metadata

Learn More

Learn about Datasets

On this page