PromptScorer
Evaluate agent behavior based on a rubric you define and iterate on the platform.
Overview
A PromptScorer
is a powerful tool for evaluating your LLM system using use-case specific, natural language rubrics.
PromptScorer's make it easy to prototype your evaluation rubrics—you can easily set up a new criteria and test them on a few examples in the scorer playground, then evaluate your agents' behavior in production with real customer usage.
Quick Start Example
from openai import OpenAI
from judgeval.scorers import PromptScorer
from judgeval.tracer import Tracer, wrap
from judgeval.data import Example
# Initialize tracer
judgment = Tracer(
project_name="default_project"
)
# Auto-trace LLM calls
client = wrap(OpenAI())
# Initialize PromptScorer
scorer = PromptScorer.create(
name="PositivityScorer",
prompt="Is the response positive or negative? Question: {{input}}, response: {{actual_output}}",
options={"positive" : 1, "negative" : 0}
)
class QAAgent:
def __init__(self, client):
self.client = client
@judgment.observe(span_type="tool")
def process_query(self, query):
response = self.client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You are a helpful assitant"},
{"role": "user", "content": f"I have a query: {query}"}]
) # Automatically traced
return f"Response: {response.choices[0].message.content}"
# Basic function tracing
@judgment.agent()
@judgment.observe(span_type="agent")
def invoke_agent(self, query):
result = self.process_query(query)
judgment.async_evaluate(
scorer=scorer,
example=Example(input=query, actual_output=result),
model="gpt-5"
)
return result
if __name__ == "__main__":
agent = QAAgent(client)
print(agent.invoke_agent("What is the capital of the United States?"))
Authentication
Set up your credentials using environment variables:
export JUDGMENT_API_KEY="your_key_here"
export JUDGMENT_ORG_ID="your_org_id_here"
# Add to your .env file
JUDGMENT_API_KEY="your_key_here"
JUDGMENT_ORG_ID="your_org_id_here"
PromptScorer Creation & Retrieval
PromptScorer.create()
/TracePromptScorer.create()
Initialize a PromptScorer
or TracePromptScorer
object.
Parameters
name
str
Required
The name of the PromptScorer
prompt
str
Required
The prompt used by the LLM judge to make an evaluation
options
dict
Optional
If specified, the LLM judge will pick from one of the choices, and the score will be the one corresponding to the choice
judgment_api_key
str
Optional
Recommended - set using the JUDGMENT_API_KEY
environment variable
organization_id
str
Optional
Recommended - set using the JUDGMENT_ORG_ID
environment variable
Returns
Returns
PromptScorer
Optional
A PromptScorer
instance
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.create(
name="Test Scorer",
prompt="Is the response positive or negative? Response: {{actual_output}}",
options={"positive" : 1, "negative" : 0}
)
PromptScorer.get()
/TracePromptScorer.get()
Retrieve a PromptScorer
or TracePromptScorer
object that had already been created for the organization.
Parameters
name
str
Required
The name of the PromptScorer you would like to retrieve
judgment_api_key
str
Optional
Recommended - set using the JUDGMENT_API_KEY
environment variable
organization_id
str
Optional
Recommended - set using the JUDGMENT_ORG_ID
environment variable
Returns
Returns
PromptScorer
Optional
A PromptScorer
instance
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
PromptScorer Management
scorer.append_to_prompt()
Add to the prompt for your PromptScorer
Parameters
prompt_addition
str
Required
This string will be added to the existing prompt for the scorer.
Returns
Returns
None
Optional
None
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
scorer.append_to_prompt("Consider the overall tone, word choice, and emotional sentiment when making your determination.")
scorer.set_threshold()
Update the threshold for your PromptScorer
scorer.set_prompt()
Update the prompt for your PromptScorer
Parameters
prompt
str
Required
The new prompt you would like the PromptScorer to use
Returns
Returns
None
Optional
None
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
scorer.set_prompt("Is the response helpful to the question? Question: {{input}}, response: {{actual_output}}")
scorer.set_options()
Update the options for your PromptScorer
scorer.get_threshold()
Retrieve the threshold for your PromptScorer
Parameters
None
Returns
Returns
float
Optional
The threshold value for the PromptScorer
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
threshold = scorer.get_threshold()
scorer.get_prompt()
Retrieve the prompt for your PromptScorer
Parameters
None
Returns
Returns
str
Optional
The prompt string for the PromptScorer
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
prompt = scorer.get_prompt()
scorer.get_options()
Retrieve the options for your PromptScorer
Parameters
None
Returns
Returns
dict
Optional
The options dictionary for the PromptScorer
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
options = scorer.get_options()
scorer.get_name()
Retrieve the name for your PromptScorer
Parameters
None
Returns
Returns
str
Optional
The name of the PromptScorer
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
name = scorer.get_name()
scorer.get_config()
Retrieve the name, prompt, options, and threshold for your PromptScorer in a dictionary format
Parameters
None
Returns
Returns
dict
Optional
Dictionary containing the name, prompt, options, and threshold for the PromptScorer
Example Code
from judgeval.scorers import PromptScorer
scorer = PromptScorer.get(
name="Test Scorer"
)
config = scorer.get_config()