v0.9 Release Notes (Sep 2, 2025)

2025-09-02
v0.9.0

Major Release: OpenTelemetry (OTEL) Integration

We've migrated the entire tracing system to OpenTelemetry, the industry-standard observability framework. This brings better compatibility with existing monitoring tools, more robust telemetry collection, and a cleaner SDK architecture. The SDK now uses auto-generated API clients from our OpenAPI specification, includes comprehensive support for LLM streaming responses, and provides enhanced span management with specialized exporters. This foundation sets us up for deeper integrations with the broader observability ecosystem.

New Features

Trace prompt scorers and evaluation improvements

Evaluate traces using prompt-based scoring with the new TracePromptScorer. This enables you to score entire trace sequences based on custom criteria, making it easier to catch complex agent misbehaviors that span multiple operations. We've also added clear separation between example-based and trace-based evaluations with distinct configuration classes, and Examples now automatically generate unique IDs and timestamps.

Command palette for faster navigation

Press Cmd+K to open the navigation and search palette. Quickly jump to any page on the platform or search our documentation for answers while using Judgment.

Better trace views and UI polish

Trace views now include input/output previews and smoother navigation between traces. Dashboard cards use consistent expand/collapse behavior, annotation tabs show proper empty states, and custom scorer pages display read-only badges when appropriate**.**

Fixes

Trace navigation issues

Fixed trace navigation from the first row.

UI revalidation after test deletion

Integrated automatic UI revalidation after test deletion.

Improvements

Better LLM streaming support

Token usage and cost tracking now works seamlessly across streaming responses from all major LLM providers, including specific support for Anthropic's client.messages.stream method. This ensures accurate cost tracking even when using streaming APIs.

Improved skeleton loading states

Improved skeleton loading states to reduce layout shift.