Example

Overview

An Example is a basic unit of data in judgeval that allows you to run evaluation scorers on your agents. An Example is can be composed of a mixture of the following fields:

input: Optional[Union[str, Dict[str, Any]]]
actual_output: Optional[Union[str, List[str]]]
expected_output: Optional[Union[str, List[str]]]
retrieval_context Optional[List[str]]
context Optional[List[str]]
tools_called: Optional[List[str]]
expected_tools: Optional[List[str]]
additional_metadata: Optional[Dict[str, Any]]

Here's a sample of creating an Example:

from judgeval.data import Example

example = Example(
    input="Who founded Microsoft?",
    actual_output="Bill Gates and Paul Allen.",
    expected_output="Bill Gates and Paul Allen founded Microsoft in New Mexico in 1975.",
    retrieval_context=["Bill Gates co-founded Microsoft with Paul Allen in 1975."],
    context=["Bill Gates and Paul Allen are the founders of Microsoft."],
    tools_called=["research_person", "research_company"],
    expected_tools=["research_person", "research_company"],
    additional_metadata={"research_source": "Wikipedia"}
)

Example Fields

Here, we cover the possible fields that make up an Example.

Input

The input field represents a sample input to your agent/task.

from judgeval.data import Example

example = Example(input="Is sparkling water healthy?")

Actual Output

The actual_output field represents what your agent outputs based on the input. This is the actual output of your agent system created either at evaluation time or with saved answers.

from judgeval.data import Example

example = Example(
    input="Is sparkling water healthy?",
    actual_output="Sparkling water is neither healthy nor unhealthy."
)

Expected Output

The expected_output field is Optional[str] and represents the ideal output of your agent.

from judgeval.data import Example

example = Example(
    input="Is sparkling water healthy?",
    actual_output="Sparkling water is neither healthy nor unhealthy.",
    expected_output="Sparkling water is neither healthy nor unhealthy."
)

Retrieval Context

The retrieval_context field is Optional[List[str]] and represents the context that is actually retrieved from a vector database.

from judgeval.data import Example

example = Example(
    input="Is sparkling water healthy?",
    actual_output="Sparkling water is neither healthy nor unhealthy.",
    expected_output="Sparkling water is neither healthy nor unhealthy.",
    retrieval_context=["Sparkling water is carbonated and has no calories."]
)

Context

The context field is Optional[List[str]] and represents information that is supplied to the agent system as ground truth.

For instance, context could be a list of facts that the agent is aware of. However, context should not be confused with retrieval_context.

In RAG systems, contextual information is retrieved from a vector database and is represented in judgeval by retrieval_context, not context.

If you're building a RAG system, you'll want to use retrieval_context.

# Sample app implementation
import medical_chatbot
from judgeval.data import Example

question = "Is sparkling water healthy?"
example = Example(
    input=question,
    actual_output=medical_chatbot.chat(question),
    expected_output="Sparkling water is neither healthy nor unhealthy.",
    retrieval_context=["Sparkling water is carbonated and has no calories."],
    context=["Sparkling water is a type of water that is carbonated."]
)

The tools_called field is Optional[List[str]] and represents the list of tools that your agent actually invoked during the process of generating the output. This is useful for evaluating tool-using agents or agents that interact with external APIs.

from judgeval.data import Example

example = Example(
    input="What is the weather in Paris?",
    actual_output="The weather in Paris is sunny and 25°C.",
    tools_called=["weather_api"]
)

Expected Tools

The expected_tools field is Optional[List[str]] and represents the list of tools that you expect your agent to use to answer the input. This can be used to compare with tools_called to evaluate whether the agent used the correct tools.

from judgeval.data import Example

example = Example(
    input="What is the weather in Paris?",
    actual_output="The weather in Paris is sunny and 25°C.",
    tools_called=["weather_api"],
    expected_tools=["weather_api"]
)

Additional Metadata

The additional_metadata field is Optional[Dict[str, Any]] and allows you to attach any extra information to the example. This can be useful for storing custom tags, sources, or any other data relevant to your evaluation or analysis.

from judgeval.data import Example

example = Example(
    input="What is the weather in Paris?",
    actual_output="The weather in Paris is sunny and 25°C.",
    tools_called=["weather_api"],
    expected_tools=["weather_api"],
    additional_metadata={"source": "OpenWeatherMap", "confidence": 0.95}
)

Conclusion

Congratulations! 🎉

You've learned how to create an Example and can begin using them to execute evaluations or create datasets.