Execution Order
The ExecutionOrder
scorer is a default scorer that checks whether your LLM agent traversed the correct path in your execution graph.
This scorer is an algorithm-based scorer that does not rely on an LLM judge.
Required Fields
To run the ExecutionOrder
scorer, you must include the following fields in your Example:
actual_output
expected_output
Scorer Breakdown
The execution order score is calculated in different ways depending on how you configure the scorer.
Set Match (Default)
Calculates the score based on the intersection of the actual and expected actions. The score is the size of the intersection divided by the total number of expected actions.
scorer = ExecutionOrderScorer(threshold=0.8)
Ordering Match
Uses the Longest Common Subsequence (LCS) to calculate the score. The score is the length of the LCS divided by the length of the expected output. If the LCS is the same as the expected output, the score is 1.0, otherwise it is 0.0.
scorer = ExecutionOrderScorer(threshold=0.8, should_consider_ordering=True)
Exact Match
Checks that the actual output matches the expected output exactly. Returns a score of 1.0 if they match, otherwise 0.0.
scorer = ExecutionOrderScorer(threshold=0.8, should_exact_match=True)
Sample Implementation
from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import ToolCorrectnessScorer
client = JudgmentClient()
example = Example(
actual_output=["GoogleSearch", "Perplexity"],
expected_output=["DBQuery", "GoogleSearch"],
)
# supply your own threshold
scorer = ExecutionOrderScorer(threshold=0.8)
results = client.run_evaluation(
examples=[example],
scorers=[scorer],
)
print(results)