v0.9 Release Notes (Sep 2, 2025)
Major Release: OpenTelemetry (OTEL) Integration
We've migrated the entire tracing system to OpenTelemetry, the industry-standard observability framework. This brings better compatibility with existing monitoring tools, more robust telemetry collection, and a cleaner SDK architecture. The SDK now uses auto-generated API clients from our OpenAPI specification, includes comprehensive support for LLM streaming responses, and provides enhanced span management with specialized exporters. This foundation sets us up for deeper integrations with the broader observability ecosystem.
New Features
Trace prompt scorers and evaluation improvements
Evaluate traces using prompt-based scoring with the new TracePromptScorer
. This enables you to score entire trace sequences based on custom criteria, making it easier to catch complex agent misbehaviors that span multiple operations. We've also added clear separation between example-based and trace-based evaluations with distinct configuration classes, and Examples now automatically generate unique IDs and timestamps.
Command palette for faster navigation
Press Cmd+K to open the navigation and search palette. Quickly jump to any page on the platform or search our documentation for answers while using Judgment.
Better trace views and UI polish
Trace views now include input/output previews and smoother navigation between traces. Dashboard cards use consistent expand/collapse behavior, annotation tabs show proper empty states, and custom scorer pages display read-only badges when appropriate****.****
Fixes
Trace navigation issues
Fixed trace navigation from the first row.
UI revalidation after test deletion
Integrated automatic UI revalidation after test deletion.
Improvements
Better LLM streaming support
Token usage and cost tracking now works seamlessly across streaming responses from all major LLM providers, including specific support for Anthropic's client.messages.stream
method. This ensures accurate cost tracking even when using streaming APIs.
Improved skeleton loading states
Improved skeleton loading states to reduce layout shift.