Prompt Scorers

A PromptScorer is a powerful tool for evaluating your LLM system using natural language criteria. Prompt scorers are great for quick prototyping—you can easily set up new evaluation criteria and test them on a few examples, then benchmark all of your agents' test cases from prod.

`judgeval` SDK

Creating a Prompt Scorer

You can create a PromptScorer by providing a natural language description of your evaluation task/criteria and a set of choices that an LLM judge can choose from when evaluating an example.

Specifically, you need to provide a prompt that describes the task/criteria. You can also use custom fields in your prompt by using the mustache {{variable_name}} syntax (more details about how to do this in the section below).

Here's an example of creating a PromptScorer that determines if a response is relevant to a request:

prompt_scorer.py

from judgeval.scorers import PromptScorer

relevance_scorer = PromptScorer.create(
    name="Relevance Scorer",
    prompt="Is the request relevant to the response? The request is {{request}} and the response is {{response}}."
)

Options

You can also provide an options dictionary where you can specify possible choices for the scorer and assign scores to these choices. Here's an example of creating a PromptScorer that determines if a response is relevant to a request, with the options dictionary:

prompt_scorer.py

from judgeval.scorers import PromptScorer

relevance_scorer = PromptScorer.create(
    name="Relevance Scorer",
    prompt="Is the request relevant to the response? The request is {{request}} and the response is {{response}}.",
    options={"Yes" : 1, "No" : 0}
)

Retrieving a Prompt Scorer

Once a Prompt Scorer has been created, you can retrieve the prompt scorer by name using the get class method for the Prompt Scorer. For example, if you had already created the Relevance Scorer from above, you can fetch it with the code below:

prompt_scorer.py

from judgeval.scorers import PromptScorer

relevance_scorer = PromptScorer.get(
    name="Relevance Scorer",
)

Edit Prompt Scorer

You can also edit a prompt scorer that you have already created. You can use the methods get_name, get_prompt, and get_options to get the fields corresponding to the scorer you created. You can update fields with the set_prompt, set_options, and set_threshold methods. In addition, you can add to the prompt using the append_to_prompt field.

edit_scorer.py

from judgeval.scorers import PromptScorer

relevancy_scorer = PromptScorer.get(
    name="Relevance Scorer",
)

# Adding another sentence to the relevancy scorer prompt
relevancy_scorer.append_to_prompt("Consider whether the response directly addresses the main topic, intent, or question presented in the request.")

# Make additions to options by using the get function and the set function
options = relevancy_scorer.get_options()
options["Maybe"] = 0.5
relevancy_scorer.set_options(options)

# Set threshold for success for the scorer
relevancy_scorer.set_threshold(0.7)

Define Custom Fields

You can create your own custom fields by creating a custom Example which inherits from the base Example object. This allows you to configure any fields you want to score.

For example, to use the relevance scorer from above, you would define a custom Example object with request and response fields.

custom_example.py

from judgeval.data import Example

class CustomerRequest(Example):
    request: str
    response: str

example = CustomerRequest(
    request="Where is my package?",
    response="Your package will arrive tomorrow at 10:00 AM.",
)

Using a Prompt Scorer

Prompt scorers can be used in the same way as any other scorer in judgeval. They can also be run in conjunction with other scorers in a single evaluation run!

Putting it all together, you can create the prompt scorer, define your custom fields, and run the prompt scorer within your agentic system like below:

run_prompt_scorer.py

from judgeval.tracer import Tracer
from judgeval.data import Example
from judgeval.scorers import PromptScorer

judgment = Tracer(project_name="prompt_scorer_test_project")


# Define the scorer
relevance_scorer = PromptScorer.create(
    name="Relevance Scorer",
    prompt="Is the request relevant to the response? The request is {{request}} and the response is {{response}}.",
    options={"Yes": 1, "No": 0}
)

# Define the custom Example class
class CustomerRequest(Example):
    request: str
    response: str

@judgment.observe(span_type="tool")
def llm_call(request: str):
    # Call LLM here to get a response
    response = "Your package will arrive tomorrow at 10:00 AM."

    # Define the example
    example = CustomerRequest(
        request=request,
        response=response,
    )

    # Then run your prompt scorer to evaluate the response
    judgment.async_evaluate(
        scorer=relevance_scorer,
        example=example,
        model="gpt-4.1",
    )
    return response

@judgment.observe(span_type="function")
def main():
    request = "Where is my package?"
    response = llm_call(request)
    print(response)

if __name__ == "__main__":
    main()

For more detailed information about using PromptScorer in the judgeval SDK, refer to the SDK reference.

Judgment Platform

Navigate to the PromptScorer tab in the Judgment platform. You'll find this via the sidebar on the left. Here, you can manage the scorers that you have created. You can also create new scorers.

Scorers

Creating a Scorer

Click the Create Scorer button in the top right corner. Enter in a name and hit the Next button to go to the next page.

Create Scorer

On this page, you can create a prompt scorer by using a criteria in natural language, supplying your custom fields from your custom Example class. Then, you can optionally supply a set of choices the scorer can select from when evaluating an example. Once you provide these fields, hit the Create Scorer button to finish creating your scorer!

Create Scorer 2

You can now use the scorer in your evaluation runs just like any other scorer in judgeval.

Scorer Playground

While creating a new scorer or editing an existing scorer, it may be helpful to get a general sense of what your scorer is like. The scorer playground helps you test your PromptScorer with custom inputs.

When on the page for the scorer you would like to test, select a model from the dropdown and enter in custom inputs for the fields. Then click on the Run Scorer button.

Run Scorer

Once you click on the button, the LLM judge will run an evaluation. Once the evaluation results are ready, you will be able to see the score, reason, and choice given by the judge.

Scoring Result

Prompt Scorers

On this page