Instruction Adherence

The InstructionAdherence scorer is a default LLM judge scorer that measures how well the actual_output follows the instructions provided in the input

This scorer is useful for evaluating the instruction-following ability of your agents, ensuring that outputs are aligned with the user's or system's explicit task requirements.

Required Fields

To run the InstructionAdherence scorer, you must include the following fields in your Example:

input
actual_output

Scorer Breakdown

InstructionAdherence scores are calculated by evaluating whether the actual_output fulfills the requirements, constraints, and steps specified in the input instructions.

The score is typically calculated as:

\text{Instruction Adherence} = \frac{\text{Number of Instructions Followed}}{\text{Total Number of Instructions}}

A higher score indicates better adherence to the instructions. The scorer may also provide a reason explaining which instructions were or were not followed.

Sample Implementation

instruction_adherence.py

from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import InstructionAdherenceScorer

client = JudgmentClient()
example = Example(
    input="List three benefits of regular exercise. Then tell me which is most influencial in the short-term.",
    actual_output="1. Improves cardiovascular health. 2. Boosts mental well-being. 3. Increases energy levels. The most influencial of these in the short-term is an increase in energy levels."
)
scorer = InstructionAdherenceScorer(threshold=0.8)

results = client.run_evaluation(
    examples=[example],
    scorers=[scorer],
    model="gpt-4.1",
)
print(results)

The InstructionAdherence scorer uses an LLM judge, so you'll receive a reason for the score in the reason field of the results. This allows you to double-check the accuracy of the evaluation and understand how the score was calculated.

Instruction Adherence

Required Fields

Scorer Breakdown

Sample Implementation

On this page