June 9, 2025
Updates: Weeks of June 2 and June 9
This combined changelog covers two weeks of updates focused on enhancing tracing capabilities, improving error handling, and expanding our monitoring tools.
Enhanced Warning System for Missing Scorer Data
judgeval #301
If any examples are missing required data for a scorer, the system will now print a clear warning detailing the missing parameters and the specific example. Crucially, after displaying these warnings, the user will be prompted to confirm if they wish to continue with the evaluation despite the potential issues.
This change aims to prevent users from unknowingly running evaluations that might fail or produce incomplete results due to malformed or incomplete input data, saving them time and providing better feedback upfront.
Improved Agent State Visibility
judgeval #295We've significantly enhanced the visibility into how an agent's state changes throughout execution. This improvement is crucial for debugging and understanding complex agent behaviors, giving developers deeper insights into their agents' decision-making processes.
Agent Execution Storage
judgeval #291Users can now store past Agent executions in a dataset, enabling you to:
- Run metrics on Agent paths
- Export execution data for analysis on third-party platforms
Exception Tracing
Users can now trace their Agents during exceptions. When an exception occurs, we gracefully handle it by saving the trace and logging the error message and error type. You can view your agents' errors on our dashboard - stratified by error type.
Multi-Agent Tool Call Data Export
Export tool call data from multi-agent systems, stratifying the results by agent. Additionally, it modifies the response to include error information when a tool call fails, providing a more complete picture of the execution trace for potential downstream uses like Reinforcement Learning (RL) training.
Live and Async Trace Tracking
We've implemented support for both live and asynchronous trace tracking, enabling users to track long-horizon agents and view progress intermittently. This feature is fully supported for both vanilla Python and Langgraph implementations.
Langgraph Metadata Display
judgeval #317
Our platform now parses and displays Langgraph metadata on the website in a dedicated metadata section, giving you deeper insights into your graph-based agents.
Customizable Alert Rules

Users can now set up rules for alerts based on conditions such as trace latency or scorer failures. Receive notifications through multiple channels:
- Slack
- PagerDuty
Coming Soon
We're working on additional features to improve tracing for Langgraph agents, with special handling for async workflows. Stay tuned for tutorials and documentation updates!