Regression Testing
Use evals as regression tests in your CI pipelines.
judgeval enables you to unit test your agent against predefined tasks/inputs, with built-in support for common
unit testing frameworks like pytest.

Quickstart
You can formulate evals as unit tests by checking if scorers exceed or fall below threshold values on a set of examples (test cases).
Setting assert_test=True in client.evaluation.create().run() runs evaluations as unit tests, raising an exception if any scorer fails (score falls below the defined threshold).
from judgeval import Judgeval
from judgeval.v1.data.example import Example
client = Judgeval(project_name="default_project")
# Retrieve a PromptScorer created on the platform
# Create the scorer on the Judgment platform first with your rubric
scorer = client.scorers.prompt_scorer.get(name="ResolutionScorer")
example = Example.create(
input="Where is my package?",
actual_output="Your P*CKAG* will arrive tomorrow at 10:00 AM."
)
res = client.evaluation.create().run(
examples=[example],
scorers=[scorer],
eval_run_name="regression_test",
assert_test=True
)If an example fails, the test will report the failure like this:
title="Test output"
================================================================================
⚠️ TEST RESULTS: 0/1 passed (1 failed)
================================================================================
✗ Test 1: FAILED
Scorer: Resolution Scorer
Score: 0.0
Reason: The response does not contain the word 'package'Pytest Integration
judgeval integrates with pytest so you don't have to write any additional scaffolding for your agent unit tests.
We'll reuse the code above and now expect a failure with pytest by running uv run pytest unit_test.py:
from judgeval import Judgeval
from judgeval.v1.data.example import Example
from judgeval.exceptions import JudgmentTestError
import pytest
client = Judgeval(project_name="default_project")
# Retrieve a PromptScorer created on the platform
# Create the scorer on the Judgment platform first with your rubric
scorer = client.scorers.prompt_scorer.get(name="ResolutionScorer")
example = Example.create(
input="Where is my package?",
actual_output="Your P*CKAG* will arrive tomorrow at 10:00 AM."
)
def test_agent_behavior():
with pytest.raises(JudgmentTestError):
client.evaluation.create().run(
examples=[example],
scorers=[scorer],
eval_run_name="regression_test",
assert_test=True
)