v0.7 Release Notes (Aug 16, 2025)

2025-08-16
v0.7.0

New Features

Reinforcement learning now available

Train custom models directly on your own data with our new reinforcement learning framework powered by Fireworks AI. You can now iteratively improve model performance using reward-based learning workflows—capture traces from production, generate training datasets, and deploy refined model snapshots all within Judgment. This makes it easier to build agents that learn from real-world usage and continuously improve over time.

Export datasets at scale

Export large datasets directly from the UI for model training or offline analysis. Both example and trace datasets can be exported in multiple formats, making it simple to integrate Judgment data into your ML pipelines or share results with your team

Histogram visualization for test results

The test page now displays score distributions using histograms instead of simple averages. See how your scores are distributed across 10 buckets to quickly identify patterns, outliers, and performance trends. This gives you deeper insights into model behavior beyond single average metrics.

Faster navigation and better feedback

Navigate between examples using arrow keys (Up/Down), close views with Escape, and get instant feedback with our new toast notification system. We've also added hover cards on table headers that explain metrics like LLM cost calculations. Plus, the Monitoring section now opens directly to your dashboard, getting you to your metrics faster

Fixes

No bug fixes in this release.

Improvements

More collaborative permissions

Annotation and trace span endpoints are now accessible to Viewers (previously required Developer permissions). This makes it easier for team members to contribute insights and annotations without needing elevated access.

Better error handling across the platform

Query timeouts now show clear, actionable error messages instead of generic failures.

Polish and refinements

Cost and token badges now appear only on LLM spans, reducing visual clutter. Score details are expandable for deeper inspection of structured data. We've also refreshed the onboarding experience with tabbed code snippets and improved dark mode styling.