JudgmentClient
Complete reference for the JudgmentClient Python SDK
JudgmentClient API Reference
The JudgmentClient is your primary interface for interacting with the Judgment platform. It provides methods for running evaluations, managing datasets, handling traces, and more.
Authentication
Set up your credentials using environment variables:
export JUDGMENT_API_KEY="your_api_key_here"
export JUDGMENT_ORG_ID="your_organization_id_here"
JudgmentClient()
Initialize a JudgmentClient
object.
Parameters
api_key
str
Optional
Recommended - set using the JUDGMENT_API_KEY
environment variable
organization_id
str
Optional
Recommended - set using the JUDGMENT_ORG_ID
environment variable
Example Code
from judgeval import JudgmentClient
client = JudgmentClient() # If you already set your .env
client = JudgmentClient(api_key="your_api_key", organization_id="your_organization_id")
client.run_evaluation()
Execute an evaluation of examples using one or more scorers to measure performance and quality of your AI models.
Parameters
scorers
List[APIJudgmentScorer]
Required
List of scorers to use for evaluation
[APIJudgmentScorer(...)]
model
str
Default: "gpt-4.1"
Optional
Model used as judge when using LLM as a Judge
"gpt-4o-mini"
project_name
str
Default: "default_project"
Optional
Name of the project for organization
"my_qa_project"
eval_run_name
str
Default: "default_eval_run"
Optional
Name for the evaluation run
"experiment_v1"
async_execution
bool
Default: False
Optional
Whether to execute the evaluation asynchronously
Example Code
from judgeval import JudgmentClient
from judgeval.data import Example
client = JudgmentClient()
examples = [
Example(
input="What is the capital of France?",
actual_output="Paris is the capital of France.",
expected_output="Paris"
)
]
from judgeval.scorers import AnswerRelevancyScorer
results = client.run_evaluation(
examples=examples,
scorers=[AnswerRelevancyScorer(threshold=0.9)],
project_name="default_project"
)
Returns
A list of ScoringResult
objects.
Example Return Value
[
ScoringResult(
success=False,
scorers_data=[ScorerData(...)],
name=None,
data_object=Example(...),
trace_id=None,
run_duration=None,
evaluation_cost=None
)
]
client.run_trace_evaluation()
Execute trace-based evaluation using function calls and tracing to evaluate agent behavior and execution flows.
Parameters
scorers
List[APIJudgmentScorer]
Required
List of scorers to use for evaluation
[APIJudgmentScorer(...)]
examples
List[Example]
Optional
Examples to run through the function (required if using function)
[Example(...)]
function
Callable
Optional
Function to execute and trace for evaluation
tracer
Union[Tracer, BaseCallbackHandler]
Optional
The tracer object used in tracing your agent
traces
List[Trace]
Optional
Pre-existing traces to evaluate instead of generating new ones
project_name
str
Default: "default_project"
Optional
Name of the project for organization
"agent_evaluation"
eval_run_name
str
Default: "default_eval_run"
Optional
Nme for the trace evaluation run
"agent_trace_v1"
Example Code
from judgeval.tracer import Tracer
tracer = Tracer()
def my_agent_function(query: str) -> str:
"""Your agent function to be traced and evaluated"""
response = f"Processing query: {query}"
return response
examples = [
Example(
input={"query": "What is the weather like?"},
expected_output="I'll help you check the weather."
)
]
from judgeval.scorers import ToolOrderScorer
results = client.run_trace_evaluation(
scorers=[ToolOrderScorer()],
examples=examples,
function=my_agent_function,
tracer=tracer,
project_name="default_project"
Returns
A list of ScoringResult
objects.
Example Return Value
[
ScoringResult(
success=False,
scorers_data=[ScorerData(...)],
name=None,
data_object=Example(...),
trace_id=None,
run_duration=None,
evaluation_cost=None
)
]
client.assert_test()
Runs evaluations as unit tests, raising an exception if the score falls below the defined threshold.
Parameters
scorers
List[APIJudgmentScorer]
Required
List of scorers to use for evaluation
[APIJudgmentScorer(...)]
model
str
Default: "gpt-4.1"
Optional
Model used as judge when using LLM as a Judge
"gpt-4o-mini"
project_name
str
Default: "default_project"
Optional
Name of the project for organization
"my_qa_project"
eval_run_name
str
Default: "default_eval_run"
Optional
Name for the evaluation run
"experiment_v1"
async_execution
bool
Default: False
Optional
Whether to execute the evaluation asynchronously
Example Code
from judgeval.scorers import FaithfulnessScorer
example = Example(
input="What if these shoes don't fit?",
actual_output="We offer a 30-day full refund...",
retrieval_context=[...],
)
client.assert_test(
examples=[example],
scorers=[ FaithfulnessScorer(threshold=0.5) ]
)
client.assert_trace_test()
Runs trace-based evaluations as unit tests, raising an exception if the score falls below the defined threshold.
Parameters
scorers
List[APIJudgmentScorer]
Required
List of scorers to use for evaluation
[APIJudgmentScorer(...)]
examples
List[Example]
Optional
Examples to run through the function (required if using function)
[Example(...)]
function
Callable
Optional
Function to execute and trace for evaluation
tracer
Union[Tracer, BaseCallbackHandler]
Optional
The tracer object used in tracing your agent
traces
List[Trace]
Optional
Pre-existing traces to evaluate instead of generating new ones
project_name
str
Default: "default_project"
Optional
Name of the project for organization
"agent_evaluation"
eval_run_name
str
Default: "default_eval_run"
Optional
Name for the trace evaluation run
"agent_trace_v1"
Example Code
from judgeval.tracer import Tracer
tracer = Tracer()
def my_agent_function(query: str) -> str:
return f"Processing query: {query}"
client.assert_trace_test(
scorers=[ToolOrderScorer()],
examples=[Example(... )],
function=my_agent_function,
tracer=tracer,
project_name="default_project"
)
Error Handling
The JudgmentClient raises specific exceptions for different error conditions:
Exception | Description |
---|---|
JudgmentAPIError | API request failures or server errors |
ValueError | Invalid parameters or configuration |
FileNotFoundError | Missing test files or datasets |
from judgeval.common.exceptions import JudgmentAPIError
try:
results = client.run_evaluation(examples, scorers)
except JudgmentAPIError as e:
print(f"API Error: {e}")
except ValueError as e:
print(f"Invalid parameters: {e}")