PromptScorer
Evaluate agent behavior based on a rubric you define and iterate on the platform.
A PromptScorer is a powerful tool for evaluating your agent's behavior in production with use-case specific, natural language rubrics.
import { Judgeval, Example } from "judgeval";
const client = Judgeval.create();
const tracer = await client.nodeTracer.create({
projectName: "qa_assistant",
enableEvaluation: true,
});
const scorer = await client.scorers.tracePromptScorer.get(
"QA Answer Quality Scorer",
);
const processQuery = tracer.observe(async function (query: string) {
const result = await generateResponse(query);
tracer.asyncTraceEvaluate(scorer, "gpt-4");
return result;
});client.scorers.promptScorer.get() | client.scorers.tracePromptScorer.get()
Fetches a Prompt Scorer or Trace Prompt Scorer configuration from the Judgment platform.
async get(name: string): Promise<PromptScorer>Parameters
The name of the PromptScorer you would like to retrieve from the platform
Returns
Promise<PromptScorer> - The fetched PromptScorer instance
Throws
Errorif the scorer is a TracePromptScorer instead of a PromptScorerErrorif the scorer is not found or API call fails
Example
import { Judgeval } from "judgeval";
const client = Judgeval.create();
const scorer = await client.scorers.promptScorer.get("My Prompt Scorer");import { Judgeval } from "judgeval";
const client = Judgeval.create();
const scorer = await client.scorers.tracePromptScorer.get("My Trace Prompt Scorer");client.scorers.promptScorer.create() | client.scorers.tracePromptScorer.create()
Creates a new PromptScorer or Trace Prompt Scorer with custom configuration.
create(config: PromptScorerConfig): PromptScorerParameters
The prompt used by the LLM judge to make an evaluation
Threshold value for success (typically 0-1)
If specified, the LLM judge will pick from one of the choices, and the score will be the one corresponding to the choice
Returns
PromptScorer - The created PromptScorer instance
Throws
Errorif name or prompt is not provided
Example
import { Judgeval } from "judgeval";
const client = Judgeval.create();
const scorer = client.scorers.promptScorer.create({
name: "Rhyme Scorer",
prompt: "Evaluate whether the two inputs rhyme: {{word_1}}, {{word_2}}",
threshold: 0.5,
model: "gpt-5",
options: {
"does not rhyme": 0,
"nearly rhymes": 0.75,
"rhymes": 1,
},
description: "Evaluates whether the two words rhyme",
});import { Judgeval } from "judgeval";
const client = Judgeval.create();
const scorer = client.scorers.tracePromptScorer.create({
name: "Workflow Coherence",
prompt: `
Evaluate the coherence of this multi-step workflow.
Consider: logical flow, error handling, and completeness.
`,
threshold: 0.7,
model: "gpt-4",
options: {
"poor coherence": 0,
"acceptable coherence": 0.5,
"good coherence": 0.8,
"excellent coherence": 1,
},
description: "Evaluates overall workflow execution quality",
});Complete Usage Example
Here's a complete example showing how to use scorers with tracing and evaluation:
import { Judgeval } from "judgeval";
import { OpenAIInstrumentation } from "@opentelemetry/instrumentation-openai";
import OpenAI from "openai";
const client = Judgeval.create();
const tracer = await client.nodeTracer.create({
projectName: "research_assistant",
enableEvaluation: true,
enableMonitoring: true,
instrumentations: [new OpenAIInstrumentation()],
resourceAttributes: {
"service.name": "research-api",
"service.version": "1.0.0",
},
});
const scorer = await client.scorers.tracePromptScorer.get("answer-quality");
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const processQuery = tracer.observe(async function (query: string) {
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: query }],
});
const result = response.choices[0].message.content || "";
tracer.asyncTraceEvaluate(scorer, "gpt-4");
return result;
}, "llm");
const result = await processQuery("What is machine learning?");
console.log(result);
await tracer.shutdown();