v0.14 Release Notes (Sep 28, 2025)
New Features
Work with trace datasets in the SDK
The Dataset
class now supports trace datasets. Use Dataset.get()
to retrieve trace datasets with full OpenTelemetry structure including spans, scores, and triggered rules. This makes it easy to export production traces for optimization (ie. SFT, DPO, RFT) or create test datasets from real agent executions for sanity checking agent updates.
Export datasets and traces
Export datasets and traces for data portability, offline analysis, or integration with external tools. This gives you full control over your evaluation data and production traces.
Fixes
Cumulative cost tracking issues
Fixed issues with cumulative cost tracking for better billing insights.
Column rendering in example datasets
Fixed column rendering in example datasets.
Improvements
Accurate, up-to-date LLM cost tracking
LLM costs are now calculated server-side with the latest pricing information, ensuring accurate cost tracking as providers update their rates.
Simpler rule configuration
Rules now trigger based on whether scores pass or fail, replacing the previous custom threshold system. This makes it easier to set up alerts without tuning specific score values.
Better multimodal content display
Enhanced display for multimodal OpenAI chat completions with proper formatting for images and text. Added fullscreen view for large content with scroll-to-bottom functionality.
Configure models per scorer
Trace prompt scorers now include model configuration, making it visible which model evaluates each trace. This gives you more control over scorer quality and cost tradeoffs.
Improved form validation
Annotation forms now make comments optional while requiring at least one scorer. Clear error messages and visual indicators guide you when required fields are missing.
Performance and visual polish
Optimized keyboard navigation for traces and improved span loading states with better icons.