v0.5 Release Notes (Aug 4, 2025)
New Features
Annotation Queue System
- Automated Queue Management: Failed traces are automatically added to an annotation queue for manual review and scoring
- Human Evaluation Workflow: Add comments and scores to queued traces, with automatic removal from queue upon completion
- Dataset Integration: Export annotated traces to datasets for long-term storage and analysis purposes
Enhanced Async Evaluations
- Sampling Control: Added sampling rate parameter to async evaluations, allowing you to control how frequently evaluations run on your production data (e.g., evaluate 5% of production traces for hallucinations). Configure sampling →
- Easier Async Evaluations: Simplified async evaluation interface to make running evaluations on live traces smoother
Local Scorer Execution
- Local Execution: Custom scorers for online evaluations now run locally with asynchronous background processing, providing faster evaluation results without slowing down the critical path. Set up local scorers →
PromptScorer Website Management
- Platform-Based PromptScorer Creation: Create, edit, delete, and manage custom prompt-based evaluation scorers with an interactive playground to test configurations in real-time before deployment. Manage PromptScorers →
Improvements
Platform Reliability
- Improved Data Serialization: Standardized JSON encoding across the platform using FastAPI's proven serialization methods for more reliable trace data handling and API communication
Community Contributions
- Special thanks to @dedsec995 and our other community contributors for helping improve the platform's data serialization capabilities