Evaluation/Scorers/Abilities
Instruction Adherence
The InstructionAdherence
scorer is a default LLM judge scorer that measures how well the actual_output
follows the instructions provided in the input
Required Fields
To run the InstructionAdherence
scorer, you must include the following fields in your Example
:
input
actual_output
Scorer Breakdown
InstructionAdherence
scores are calculated by evaluating whether the actual_output
fulfills the requirements, constraints, and steps specified in the input
instructions.
The score is typically calculated as:
A higher score indicates better adherence to the instructions. The scorer may also provide a reason explaining which instructions were or were not followed.
Sample Implementation
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import InstructionAdherenceScorer
client = JudgmentClient()
example = Example(
input="List three benefits of regular exercise. Then tell me which is most influencial in the short-term.",
actual_output="1. Improves cardiovascular health. 2. Boosts mental well-being. 3. Increases energy levels. The most influencial of these in the short-term is an increase in energy levels."
)
scorer = InstructionAdherenceScorer(threshold=0.8)
results = client.run_evaluation(
examples=[example],
scorers=[scorer],
model="gpt-4.1",
)
print(results)