Judgment Labs Logo
Model Providers

LiteLLM

If you use LiteLLM within your application, you can trace, monitor, and analyze all LLM calls with Judgment. LiteLLM provides a unified interface to call 100+ LLM providers using a consistent API.

Because LiteLLM exposes a functional API (litellm.completion), the standard wrap() approach does not apply. Instead, register an OpenTelemetry instrumentor to automatically capture every LiteLLM call — including model name, token usage, and cost.

Install Dependencies

uv add judgeval opentelemetry-instrumentation-litellm litellm
pip install judgeval opentelemetry-instrumentation-litellm litellm

Initialize Tracing

setup.py
from judgeval import Tracer
from opentelemetry.instrumentation.litellm import LiteLLMInstrumentor

Tracer.init(project_name="litellm_project")
Tracer.registerOTELInstrumentation(LiteLLMInstrumentor())

LiteLLMInstrumentor monkeypatches litellm.completion and related functions so every call emits an OTEL span with the model name, prompt/completion tokens, and cost.

Use LiteLLM as Normal

app.py
import litellm
from judgeval import Tracer
from opentelemetry.instrumentation.litellm import LiteLLMInstrumentor

Tracer.init(project_name="litellm_project")
Tracer.registerOTELInstrumentation(LiteLLMInstrumentor())

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}]
)

print(response.choices[0].message.content)

All LiteLLM calls are automatically traced and exported to the Judgment platform.

Multi-Agent / Swarm Use Case

LiteLLM is commonly used inside multi-agent frameworks (e.g. swarm orchestrators) where each sub-agent calls litellm.completion directly. The instrumentor ensures that cost is attributed to the spans where inference actually happens — not the parent orchestrator.

swarm_example.py
import litellm
from judgeval import Tracer
from opentelemetry.instrumentation.litellm import LiteLLMInstrumentor

Tracer.init(project_name="swarm_project")
Tracer.registerOTELInstrumentation(LiteLLMInstrumentor())

@Tracer.observe(span_type="agent")
def data_worker(query: str) -> str:
    response = litellm.completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Process this data: {query}"}]
    )
    return response.choices[0].message.content

@Tracer.observe(span_type="agent")
def review_worker(data: str) -> str:
    response = litellm.completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Review this output: {data}"}]
    )
    return response.choices[0].message.content

@Tracer.observe(span_type="agent")
def orchestrator(query: str) -> str:
    data = data_worker(query)
    review = review_worker(data)
    return review

result = orchestrator("Analyze Q4 revenue trends")

In the Judgment trace view, each data_worker and review_worker span carries its own cost (from the underlying litellm.completion call), while the orchestrator span shows $0.00 since it makes no direct model calls.