CI pipelines are the core of all mature software engineering practices.

With LLMs, developers should expect nothing less. Using judgeval, you can easily unit test your LLM applications for consistency and quality in any metric of your choice.

Unit testing is natively supported in judgeval through the client.assert_test (Python) or client.assertTest (Typescript) method. This also integrates with popular testing frameworks like pytest (Python) or jest/vitest (Typescript), meaning you won't have to learn any new testing frameworks!

Single Step Testing

import pytest # Added import
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import FaithfulnessScorer

def test_faithfulness():
    client = JudgmentClient()
    
    example = Example(
        input="What is the capital of France?",
        actual_output="The capital of France is Lyon.", # Hallucinated output
        retrieval_context=["Come tour Paris' museums in the capital of France!"],
    )

    # Example contains a hallucination, so we should expect an exception/assertion error
    # when the threshold is 1.0 (expecting perfect faithfulness)
    with pytest.raises(AssertionError):
        client.assert_test(
            eval_run_name="test_faithfulness_fail",
            examples=[example],
            scorers=[FaithfulnessScorer(threshold=1.0)],
            model="gpt-4o" # Added model parameter
        )
    
    # This should pass as the threshold is low
    client.assert_test(
        eval_run_name="test_faithfulness_pass",
        examples=[example],
        scorers=[FaithfulnessScorer(threshold=0.1)],
        model="gpt-4o" # Added model parameter
    )

judgeval naturally integrates into your CI pipelines, allowing you to execute robust unit tests across your entire codebase. This allows you to catch regressions in your LLM applications before they make it to production!

Unit Testing

Single Step Testing

Agentic Testing

On this page