v0.6 Release Notes (Aug 14, 2025)
New Features
Server-Hosted Custom Scorers
- CLI for Custom Scorer Upload: New
judgeval
CLI withupload_scorer
command for submitting custom Python scorer files and dependencies to the backend for hosted execution - Hosted vs Local Scorer Support: Clear differentiation between locally executed and server-hosted custom scorers through the
e2b_enabled
flag - Enhanced API Client: Updated client with custom scorer upload endpoint and extended timeout for file transfers
Enhanced Prompt Scorer Capabilities
- Threshold Configuration: Added threshold parameter (0-1 scale) to prompt scorers for defining success criteria with getter functions for controlled access. Learn about PromptScorers →
Rules and Custom Scorers
- Custom Score Rules: Integration of custom score names in rule configuration for expanded metric triggers beyond predefined options. Configure rules →
Advanced Dashboard Features
- Scores Dashboard: New dedicated dashboard for visualizing evaluation scores over time with comprehensive percentile data tables
- Rules Dashboard: Interactive dashboard for tracking rule invocations with detailed charts and statistics
- Test Comparison Tool: Side-by-side comparison of test runs with detailed metric visualization and output-level diffing
Real-Time Monitoring Enhancements
- Live Trace Status: Real-time polling for trace and span execution status with visual indicators for running operations
- Class Name Visualization: Color-coded badges for class names in trace spans for improved observability and navigation
Improvements
Evaluation System Refinements
- Simplified API Management: Evaluation runs now automatically handle result management with unique IDs and timestamps, eliminating the need to manage
append
andoverride
parameters