Example
Overview
An Example
is a basic unit of data in judgeval
that allows you to run evaluation scorers on your agents.
An Example
is can be composed of a mixture of the following fields:
input: Optional[Union[str, Dict[str, Any]]]
actual_output: Optional[Union[str, List[str]]]
expected_output: Optional[Union[str, List[str]]]
retrieval_context Optional[List[str]]
context Optional[List[str]]
tools_called: Optional[List[str]]
expected_tools: Optional[List[str]]
additional_metadata: Optional[Dict[str, Any]]
Here's a sample of creating an Example
:
from judgeval.data import Example
example = Example(
input="Who founded Microsoft?",
actual_output="Bill Gates and Paul Allen.",
expected_output="Bill Gates and Paul Allen founded Microsoft in New Mexico in 1975.",
retrieval_context=["Bill Gates co-founded Microsoft with Paul Allen in 1975."],
context=["Bill Gates and Paul Allen are the founders of Microsoft."],
tools_called=["research_person", "research_company"],
expected_tools=["research_person", "research_company"],
additional_metadata={"research_source": "Wikipedia"}
)
Example Fields
Here, we cover the possible fields that make up an Example
.
Input
The input
field represents a sample input to your agent/task.
from judgeval.data import Example
example = Example(input="Is sparkling water healthy?")
Actual Output
The actual_output
field represents what your agent outputs based on the input
.
This is the actual output of your agent system created either at evaluation time or with saved answers.
from judgeval.data import Example
example = Example(
input="Is sparkling water healthy?",
actual_output="Sparkling water is neither healthy nor unhealthy."
)
Expected Output
The expected_output
field is Optional[str]
and represents the ideal output of your agent.
from judgeval.data import Example
example = Example(
input="Is sparkling water healthy?",
actual_output="Sparkling water is neither healthy nor unhealthy.",
expected_output="Sparkling water is neither healthy nor unhealthy."
)
Retrieval Context
The retrieval_context
field is Optional[List[str]]
and represents the context that is actually retrieved from a vector database.
from judgeval.data import Example
example = Example(
input="Is sparkling water healthy?",
actual_output="Sparkling water is neither healthy nor unhealthy.",
expected_output="Sparkling water is neither healthy nor unhealthy.",
retrieval_context=["Sparkling water is carbonated and has no calories."]
)
Context
The context
field is Optional[List[str]]
and represents information that is supplied to the agent system as ground truth.
For instance, context could be a list of facts that the agent is aware of. However, context
should not be confused with retrieval_context
.
# Sample app implementation
import medical_chatbot
from judgeval.data import Example
question = "Is sparkling water healthy?"
example = Example(
input=question,
actual_output=medical_chatbot.chat(question),
expected_output="Sparkling water is neither healthy nor unhealthy.",
retrieval_context=["Sparkling water is carbonated and has no calories."],
context=["Sparkling water is a type of water that is carbonated."]
)
Tools Called
The tools_called
field is Optional[List[str]]
and represents the list of tools that your agent actually invoked during the process of generating the output. This is useful for evaluating tool-using agents or agents that interact with external APIs.
from judgeval.data import Example
example = Example(
input="What is the weather in Paris?",
actual_output="The weather in Paris is sunny and 25°C.",
tools_called=["weather_api"]
)
Expected Tools
The expected_tools
field is Optional[List[str]]
and represents the list of tools that you expect your agent to use to answer the input. This can be used to compare with tools_called
to evaluate whether the agent used the correct tools.
from judgeval.data import Example
example = Example(
input="What is the weather in Paris?",
actual_output="The weather in Paris is sunny and 25°C.",
tools_called=["weather_api"],
expected_tools=["weather_api"]
)
Additional Metadata
The additional_metadata
field is Optional[Dict[str, Any]]
and allows you to attach any extra information to the example. This can be useful for storing custom tags, sources, or any other data relevant to your evaluation or analysis.
from judgeval.data import Example
example = Example(
input="What is the weather in Paris?",
actual_output="The weather in Paris is sunny and 25°C.",
tools_called=["weather_api"],
expected_tools=["weather_api"],
additional_metadata={"source": "OpenWeatherMap", "confidence": 0.95}
)
Conclusion
Congratulations! 🎉
You've learned how to create an Example
and can begin using them to execute evaluations or create datasets.