Unit Testing
Use evals as unit tests in your CI pipelines
judgeval
enables you to unit test your agent against predefined tasks/inputs, with built-in support for common
unit testing frameworks like pytest
.
Quickstart
You can formulate evals as unit tests by checking if scorers exceed or fall below threshold values on a set of examples (test cases).
The client.assert_test()
runs evaluations as unit tests, raising an exception if the score falls below the defined threshold.
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import FaithfulnessScorer
client = JudgmentClient()
agent = ... # your agent
task = "What if these shoes don't fit?"
example = Example(
input=task,
actual_output=agent.run(task), # e.g. "We offer a 30-day full refund at no extra cost."
retrieval_context=["All customers are eligible for a 44 day full refund at no extra cost."],
)
scorer = FaithfulnessScorer(threshold=0.5)
client.assert_test(
examples=[example],
scorers=[scorer],
model="gpt-4.1",
project_name="my-first-test"
)
Let's break down how assert_test()
works:
assert_test()
will take each example from examples
and pass the input
to function
as kwargs (in this case, the function is generate_itinerary
) to generate an output.
Then, the scorers will be applied to the output and if the score falls below the defined threshold, the test will fail with an exception raised.
Pytest Integration
judgeval
integrates with pytest
so you don't have to write any additional scaffolding for your agent unit tests.
We'll reuse the code above and now expect a failure with pytest:
import pytest
...
with pytest.raises(AssertionError):
client.assert_test(
examples=[example],
scorers=[scorer],
model="gpt-4.1",
)
Test Suites From YAML
If you have a large number of tests, you may find it convenient to store your test cases in a YAML file:
# tests.yaml
examples:
- input:
"What if these shoes don't fit?"
expected_output:
"We offer a 30-day full refund at no extra cost."
retrieval_context:
- "All customers are eligible for a 44 day full refund at no extra cost."
Then run the evaluation using the YAML file:
from judgeval.utils.file_utils import get_examples_from_yaml
client.assert_test(
examples=get_examples_from_yaml("tests.yaml"),
scorers=[FaithfulnessScorer(threshold=0.5)],
model="gpt-4.1",
)
Advanced
Create unit tests for scorers that operate at the trace level
See the API reference for assert_test()