v0.6 Release Notes (Aug 14, 2025)

2025-08-11
v0.6.0

New Features

Server-Hosted Custom Scorers

  • CLI for Custom Scorer Upload: New judgeval CLI with upload_scorer command for submitting custom Python scorer files and dependencies to the backend for hosted execution
  • Hosted vs Local Scorer Support: Clear differentiation between locally executed and server-hosted custom scorers through the server_hosted flag
  • Enhanced API Client: Updated client with custom scorer upload endpoint and extended timeout for file transfers

Enhanced Prompt Scorer Capabilities

  • Threshold Configuration: Added threshold parameter (0-1 scale) to prompt scorers for defining success criteria with getter functions for controlled access. Learn about PromptScorers →

Rules and Custom Scorers

  • Custom Score Rules: Integration of custom score names in rule configuration for expanded metric triggers beyond predefined options. Configure rules →

Advanced Dashboard Features

  • Scores Dashboard: New dedicated dashboard for visualizing evaluation scores over time with comprehensive percentile data tables
  • Rules Dashboard: Interactive dashboard for tracking rule invocations with detailed charts and statistics
  • Test Comparison Tool: Side-by-side comparison of test runs with detailed metric visualization and output-level diffing

Real-Time Monitoring Enhancements

  • Live Trace Status: Real-time polling for trace and span execution status with visual indicators for running operations
  • Class Name Visualization: Color-coded badges for class names in trace spans for improved observability and navigation

Fixes

No bug fixes in this release.

Improvements

Evaluation System Refinements

  • Simplified API Management: Evaluation runs now automatically handle result management with unique IDs and timestamps, eliminating the need to manage append and override parameters