Cursor Rules

When building agents and LLM workflows in Cursor, providing proper context to your coding assistant helps ensure seamless integration with Judgment. This rule file supplies the essential context your coding assistant needs for successful implementation.
Cursor Rules File

To implement this rule file, simply copy the text below and save it in a ".cursor/rules" directory in your project's root directory. Save this file as an .mdc file.
Cursor Rule File
---
You are an expert in helping users integrate Judgment with their codebase. When you are helping someone integrate Judgment tracing or evaluations with their agents/workflows, refer to this file. 
---

# Common Questions You May Get from the User (and How to Handle These Cases):

## Sample Agent 1:
```
from uuid import uuid4
import openai
import os
import asyncio
from tavily import TavilyClient
from dotenv import load_dotenv
import chromadb
from chromadb.utils import embedding_functions

destinations_data = [
    {
        "destination": "Paris, France",
        "information": """
Paris is the capital city of France and a global center for art, fashion, and culture.
Key Information:
- Best visited during spring (March-May) or fall (September-November)
- Famous landmarks: Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, Arc de Triomphe
- Known for: French cuisine, café culture, fashion, art galleries
- Local transportation: Metro system is extensive and efficient
- Popular neighborhoods: Le Marais, Montmartre, Latin Quarter
- Cultural tips: Basic French phrases are appreciated; many restaurants close between lunch and dinner
- Must-try experiences: Seine River cruise, visiting local bakeries, Luxembourg Gardens
"""
    },
    {
        "destination": "Tokyo, Japan",
        "information": """
Tokyo is Japan's bustling capital, blending ultramodern and traditional elements.
Key Information:
- Best visited during spring (cherry blossoms) or fall (autumn colors)
- Famous areas: Shibuya, Shinjuku, Harajuku, Akihabara
- Known for: Technology, anime culture, sushi, efficient public transport
- Local transportation: Extensive train and subway network
- Cultural tips: Bow when greeting, remove shoes indoors, no tipping
- Must-try experiences: Robot Restaurant, teamLab Borderless, Tsukiji Outer Market
- Popular day trips: Mount Fuji, Kamakura, Nikko
"""
    },
    {
        "destination": "New York City, USA",
        "information": """
New York City is a global metropolis known for its diversity, culture, and iconic skyline.
Key Information:
- Best visited during spring (April-June) or fall (September-November)
- Famous landmarks: Statue of Liberty, Times Square, Central Park, Empire State Building
- Known for: Broadway shows, diverse cuisine, shopping, museums
- Local transportation: Extensive subway system, yellow cabs, ride-sharing
- Popular areas: Manhattan, Brooklyn, Queens
- Cultural tips: Fast-paced environment, tipping expected (15-20%)
- Must-try experiences: Broadway show, High Line walk, food tours
"""
    },
    {
        "destination": "Barcelona, Spain",
        "information": """
Barcelona is a vibrant city known for its art, architecture, and Mediterranean culture.
Key Information:
- Best visited during spring and fall for mild weather
- Famous landmarks: Sagrada Familia, Park Güell, Casa Batlló
- Known for: Gaudi architecture, tapas, beach culture, FC Barcelona
- Local transportation: Metro, buses, and walkable city center
- Popular areas: Gothic Quarter, Eixample, La Barceloneta
- Cultural tips: Late dinner times (after 8 PM), siesta tradition
- Must-try experiences: La Rambla walk, tapas crawl, local markets
"""
    },
    {
        "destination": "Bangkok, Thailand",
        "information": """
Bangkok is Thailand's capital city, famous for its temples, street food, and vibrant culture.
Key Information:
- Best visited during November to February (cool and dry season)
- Famous sites: Grand Palace, Wat Phra Kaew, Wat Arun
- Known for: Street food, temples, markets, nightlife
- Local transportation: BTS Skytrain, MRT, tuk-tuks, river boats
- Popular areas: Sukhumvit, Old City, Chinatown
- Cultural tips: Dress modestly at temples, respect royal family
- Must-try experiences: Street food tours, river cruises, floating markets
"""
    }
]

client = openai.Client(api_key=os.getenv("OPENAI_API_KEY"))

def populate_vector_db(collection, destinations_data):
    """
    Populate the vector DB with travel information.
    destinations_data should be a list of dictionaries with 'destination' and 'information' keys
    """
    for data in destinations_data:
        collection.add(
            documents=[data['information']],
            metadatas=[{"destination": data['destination']}],
            ids=[f"destination_{data['destination'].lower().replace(' ', '_')}"]
        )

def search_tavily(query):
    """Fetch travel data using Tavily API."""
    API_KEY = os.getenv("TAVILY_API_KEY")
    client = TavilyClient(api_key=API_KEY)
    results = client.search(query, num_results=3)
    return results

async def get_attractions(destination):
    """Search for top attractions in the destination."""
    prompt = f"Best tourist attractions in {destination}"
    attractions_search = search_tavily(prompt)
    return attractions_search

async def get_hotels(destination):
    """Search for hotels in the destination."""
    prompt = f"Best hotels in {destination}"
    hotels_search = search_tavily(prompt)
    return hotels_search

async def get_flights(destination):
    """Search for flights to the destination."""
    prompt = f"Flights to {destination} from major cities"
    flights_search = search_tavily(prompt)
    return flights_search

async def get_weather(destination, start_date, end_date):
    """Search for weather information."""
    prompt = f"Weather forecast for {destination} from {start_date} to {end_date}"
    weather_search = search_tavily(prompt)
    return weather_search

def initialize_vector_db():
    """Initialize ChromaDB with OpenAI embeddings."""
    client = chromadb.Client()
    embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )
    res = client.get_or_create_collection(
        "travel_information",
        embedding_function=embedding_fn
    )
    populate_vector_db(res, destinations_data)
    return res

def query_vector_db(collection, destination, k=3):
    """Query the vector database for existing travel information."""
    try:
        results = collection.query(
            query_texts=[destination],
            n_results=k
        )
        return results['documents'][0] if results['documents'] else []
    except Exception:
        return []

async def research_destination(destination, start_date, end_date):
    """Gather all necessary travel information for a destination."""
    # First, check the vector database
    collection = initialize_vector_db()
    existing_info = query_vector_db(collection, destination)
    
    # Get real-time information from Tavily
    tavily_data = {
        "attractions": await get_attractions(destination),
        "hotels": await get_hotels(destination),
        "flights": await get_flights(destination),
        "weather": await get_weather(destination, start_date, end_date)
    }
    
    return {
        "vector_db_results": existing_info,
        **tavily_data
    }

async def create_travel_plan(destination, start_date, end_date, research_data):
    """Generate a travel itinerary using the researched data."""
    vector_db_context = "\n".join(research_data['vector_db_results']) if research_data['vector_db_results'] else "No pre-stored information available."
    
    prompt = f"""
    Create a structured travel itinerary for a trip to {destination} from {start_date} to {end_date}.
    
    Pre-stored destination information:
    {vector_db_context}
    
    Current travel data:
    - Attractions: {research_data['attractions']}
    - Hotels: {research_data['hotels']}
    - Flights: {research_data['flights']}
    - Weather: {research_data['weather']}
    """
    
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[
            {"role": "system", "content": "You are an expert travel planner. Combine both historical and current information to create the best possible itinerary."},
            {"role": "user", "content": prompt}
        ]
    ).choices[0].message.content
    
    return response

async def generate_itinerary(destination, start_date, end_date):
    """Main function to generate a travel itinerary."""
    research_data = await research_destination(destination, start_date, end_date)
    res = await create_travel_plan(destination, start_date, end_date, research_data)
    return res


if __name__ == "__main__":
    load_dotenv()
    destination = input("Enter your travel destination: ")
    start_date = input("Enter start date (YYYY-MM-DD): ")
    end_date = input("Enter end date (YYYY-MM-DD): ")
    itinerary = asyncio.run(generate_itinerary(destination, start_date, end_date))
    print("\nGenerated Itinerary:\n", itinerary)
```

## Sample Query 1:
Can you add Judgment tracing to my file?

## Example of Modified Code after Query 1:
```
from uuid import uuid4
import openai
import os
import asyncio
from tavily import TavilyClient
from dotenv import load_dotenv
import chromadb
from chromadb.utils import embedding_functions

from judgeval.tracer import Tracer, wrap
from judgeval.scorers import AnswerRelevancyScorer, FaithfulnessScorer
from judgeval.data import Example

destinations_data = [
    {
        "destination": "Paris, France",
        "information": """
Paris is the capital city of France and a global center for art, fashion, and culture.
Key Information:
- Best visited during spring (March-May) or fall (September-November)
- Famous landmarks: Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, Arc de Triomphe
- Known for: French cuisine, café culture, fashion, art galleries
- Local transportation: Metro system is extensive and efficient
- Popular neighborhoods: Le Marais, Montmartre, Latin Quarter
- Cultural tips: Basic French phrases are appreciated; many restaurants close between lunch and dinner
- Must-try experiences: Seine River cruise, visiting local bakeries, Luxembourg Gardens
"""
    },
    {
        "destination": "Tokyo, Japan",
        "information": """
Tokyo is Japan's bustling capital, blending ultramodern and traditional elements.
Key Information:
- Best visited during spring (cherry blossoms) or fall (autumn colors)
- Famous areas: Shibuya, Shinjuku, Harajuku, Akihabara
- Known for: Technology, anime culture, sushi, efficient public transport
- Local transportation: Extensive train and subway network
- Cultural tips: Bow when greeting, remove shoes indoors, no tipping
- Must-try experiences: Robot Restaurant, teamLab Borderless, Tsukiji Outer Market
- Popular day trips: Mount Fuji, Kamakura, Nikko
"""
    },
    {
        "destination": "New York City, USA",
        "information": """
New York City is a global metropolis known for its diversity, culture, and iconic skyline.
Key Information:
- Best visited during spring (April-June) or fall (September-November)
- Famous landmarks: Statue of Liberty, Times Square, Central Park, Empire State Building
- Known for: Broadway shows, diverse cuisine, shopping, museums
- Local transportation: Extensive subway system, yellow cabs, ride-sharing
- Popular areas: Manhattan, Brooklyn, Queens
- Cultural tips: Fast-paced environment, tipping expected (15-20%)
- Must-try experiences: Broadway show, High Line walk, food tours
"""
    },
    {
        "destination": "Barcelona, Spain",
        "information": """
Barcelona is a vibrant city known for its art, architecture, and Mediterranean culture.
Key Information:
- Best visited during spring and fall for mild weather
- Famous landmarks: Sagrada Familia, Park Güell, Casa Batlló
- Known for: Gaudi architecture, tapas, beach culture, FC Barcelona
- Local transportation: Metro, buses, and walkable city center
- Popular areas: Gothic Quarter, Eixample, La Barceloneta
- Cultural tips: Late dinner times (after 8 PM), siesta tradition
- Must-try experiences: La Rambla walk, tapas crawl, local markets
"""
    },
    {
        "destination": "Bangkok, Thailand",
        "information": """
Bangkok is Thailand's capital city, famous for its temples, street food, and vibrant culture.
Key Information:
- Best visited during November to February (cool and dry season)
- Famous sites: Grand Palace, Wat Phra Kaew, Wat Arun
- Known for: Street food, temples, markets, nightlife
- Local transportation: BTS Skytrain, MRT, tuk-tuks, river boats
- Popular areas: Sukhumvit, Old City, Chinatown
- Cultural tips: Dress modestly at temples, respect royal family
- Must-try experiences: Street food tours, river cruises, floating markets
"""
    }
]

client = wrap(openai.Client(api_key=os.getenv("OPENAI_API_KEY")))
judgment = Tracer(api_key=os.getenv("JUDGMENT_API_KEY"), project_name="travel_agent_demo")

def populate_vector_db(collection, destinations_data):
    """
    Populate the vector DB with travel information.
    destinations_data should be a list of dictionaries with 'destination' and 'information' keys
    """
    for data in destinations_data:
        collection.add(
            documents=[data['information']],
            metadatas=[{"destination": data['destination']}],
            ids=[f"destination_{data['destination'].lower().replace(' ', '_')}"]
        )

@judgment.observe(span_type="search_tool")
def search_tavily(query):
    """Fetch travel data using Tavily API."""
    API_KEY = os.getenv("TAVILY_API_KEY")
    client = TavilyClient(api_key=API_KEY)
    results = client.search(query, num_results=3)
    return results

@judgment.observe(span_type="tool")
async def get_attractions(destination):
    """Search for top attractions in the destination."""
    prompt = f"Best tourist attractions in {destination}"
    attractions_search = search_tavily(prompt)
    return attractions_search

@judgment.observe(span_type="tool")
async def get_hotels(destination):
    """Search for hotels in the destination."""
    prompt = f"Best hotels in {destination}"
    hotels_search = search_tavily(prompt)
    return hotels_search

@judgment.observe(span_type="tool")
async def get_flights(destination):
    """Search for flights to the destination."""
    prompt = f"Flights to {destination} from major cities"
    flights_search = search_tavily(prompt)
    example = Example(
        input=prompt,
        actual_output=str(flights_search["results"])
    )
    judgment.async_evaluate(
        scorers=[AnswerRelevancyScorer(threshold=0.5)],
        example=example,
        model="gpt-4.1"
    )
    return flights_search

@judgment.observe(span_type="tool")
async def get_weather(destination, start_date, end_date):
    """Search for weather information."""
    prompt = f"Weather forecast for {destination} from {start_date} to {end_date}"
    weather_search = search_tavily(prompt)
    example = Example(
        input=prompt,
        actual_output=str(weather_search["results"])
    )
    judgment.async_evaluate(
        scorers=[AnswerRelevancyScorer(threshold=0.5)],
        example=example,
        model="gpt-4.1"
    )
    return weather_search

def initialize_vector_db():
    """Initialize ChromaDB with OpenAI embeddings."""
    client = chromadb.Client()
    embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )
    res = client.get_or_create_collection(
        "travel_information",
        embedding_function=embedding_fn
    )
    populate_vector_db(res, destinations_data)
    return res

@judgment.observe(span_type="retriever")
def query_vector_db(collection, destination, k=3):
    """Query the vector database for existing travel information."""
    try:
        results = collection.query(
            query_texts=[destination],
            n_results=k
        )
        return results['documents'][0] if results['documents'] else []
    except Exception:
        return []

@judgment.observe(span_type="Research")
async def research_destination(destination, start_date, end_date):
    """Gather all necessary travel information for a destination."""
    # First, check the vector database
    collection = initialize_vector_db()
    existing_info = query_vector_db(collection, destination)
    
    # Get real-time information from Tavily
    tavily_data = {
        "attractions": await get_attractions(destination),
        "hotels": await get_hotels(destination),
        "flights": await get_flights(destination),
        "weather": await get_weather(destination, start_date, end_date)
    }
    
    return {
        "vector_db_results": existing_info,
        **tavily_data
    }

@judgment.observe(span_type="function")
async def create_travel_plan(destination, start_date, end_date, research_data):
    """Generate a travel itinerary using the researched data."""
    vector_db_context = "\n".join(research_data['vector_db_results']) if research_data['vector_db_results'] else "No pre-stored information available."
    
    prompt = f"""
    Create a structured travel itinerary for a trip to {destination} from {start_date} to {end_date}.
    
    Pre-stored destination information:
    {vector_db_context}
    
    Current travel data:
    - Attractions: {research_data['attractions']}
    - Hotels: {research_data['hotels']}
    - Flights: {research_data['flights']}
    - Weather: {research_data['weather']}
    """
    
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[
            {"role": "system", "content": "You are an expert travel planner. Combine both historical and current information to create the best possible itinerary."},
            {"role": "user", "content": prompt}
        ]
    ).choices[0].message.content

    example = Example(
        input=prompt,
        actual_output=str(response),
        retrieval_context=[str(vector_db_context), str(research_data)]
    )
    judgment.async_evaluate(
        scorers=[FaithfulnessScorer(threshold=0.5)],
        example=example,
        model="gpt-4.1"
    )
    
    return response

@judgment.observe(span_type="function")
async def generate_itinerary(destination, start_date, end_date):
    """Main function to generate a travel itinerary."""
    research_data = await research_destination(destination, start_date, end_date)
    res = await create_travel_plan(destination, start_date, end_date, research_data)
    return res


if __name__ == "__main__":
    load_dotenv()
    destination = input("Enter your travel destination: ")
    start_date = input("Enter start date (YYYY-MM-DD): ")
    end_date = input("Enter end date (YYYY-MM-DD): ")
    itinerary = asyncio.run(generate_itinerary(destination, start_date, end_date))
    print("\nGenerated Itinerary:\n", itinerary)
```

## Sample Agent 2
```
from langchain_openai import ChatOpenAI
import asyncio
import os

import chromadb
from chromadb.utils import embedding_functions

from vectordbdocs import financial_data

from typing import Optional
from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage, ChatMessage
from typing_extensions import TypedDict
from langgraph.graph import StateGraph

# Define our state type
class AgentState(TypedDict):
    messages: list[BaseMessage]
    category: Optional[str]
    documents: Optional[str]
    
def populate_vector_db(collection, raw_data):
    """
    Populate the vector DB with financial information.
    """
    for data in raw_data:
        collection.add(
            documents=[data['information']],
            metadatas=[{"category": data['category']}],
            ids=[f"category_{data['category'].lower().replace(' ', '_')}_{os.urandom(4).hex()}"]
        )

# Define a ChromaDB collection for document storage
client = chromadb.Client()
collection = client.get_or_create_collection(
    name="financial_docs",
    embedding_function=embedding_functions.OpenAIEmbeddingFunction(api_key=os.getenv("OPENAI_API_KEY"))
)

populate_vector_db(collection, financial_data)

def pnl_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "pnl"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

def balance_sheet_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "balance_sheets"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

def stock_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "stocks"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

async def bad_classifier(state: AgentState) -> AgentState:
    return {"messages": state["messages"], "category": "stocks"}

async def bad_classify(state: AgentState) -> AgentState:
    category = await bad_classifier(state)

    return {"messages": state["messages"], "category": category["category"]}

async def bad_sql_generator(state: AgentState) -> AgentState:
    ACTUAL_OUTPUT = "SELECT * FROM pnl WHERE stock_symbol = 'GOOGL'"
    return {"messages": state["messages"] + [ChatMessage(content=ACTUAL_OUTPUT, role="text2sql")]}

# Create the classifier node with a system prompt
async def classify(state: AgentState) -> AgentState:
    messages = state["messages"]
    input_msg = [
        SystemMessage(content="""You are a financial query classifier. Your job is to classify user queries into one of three categories:
        - 'pnl' for Profit and Loss related queries
        - 'balance_sheets' for Balance Sheet related queries
        - 'stocks' for Stock market related queries
        
        Respond ONLY with the category name in lowercase, nothing else."""),
        *messages
    ]
    
    response = ChatOpenAI(model="gpt-4.1", temperature=0).invoke(
        input=input_msg
    )

    return {"messages": state["messages"], "category": response.content}

# Add router node to direct flow based on classification
def router(state: AgentState) -> str:
    return state["category"]

async def generate_response(state: AgentState) -> AgentState:
    messages = state["messages"]
    documents = state.get("documents", "")

    OUTPUT = """
        SELECT 
            stock_symbol,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity ELSE -quantity END) AS total_shares,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity * price_per_share ELSE -quantity * price_per_share END) AS total_cost,
            MAX(CASE WHEN transaction_type = 'buy' THEN price_per_share END) AS current_market_price
        FROM 
            stock_transactions
        WHERE 
            stock_symbol = 'META'
        GROUP BY 
            stock_symbol;
        """

    return {"messages": messages + [ChatMessage(content=OUTPUT, role="text2sql")], "documents": documents}

async def main():
        # Initialize the graph
        graph_builder = StateGraph(AgentState)

        # Add classifier node
        # For failure test, pass in bad_classifier
        graph_builder.add_node("classifier", classify)
        # graph_builder.add_node("classifier", bad_classify)
        
        # Add conditional edges based on classification
        graph_builder.add_conditional_edges(
            "classifier",
            router,
            {
                "pnl": "pnl_retriever",
                "balance_sheets": "balance_sheet_retriever",
                "stocks": "stock_retriever"
            }
        )
        
        # Add retriever nodes (placeholder functions for now)
        graph_builder.add_node("pnl_retriever", pnl_retriever)
        graph_builder.add_node("balance_sheet_retriever", balance_sheet_retriever)
        graph_builder.add_node("stock_retriever", stock_retriever)

        # Add edges from retrievers to response generator
        graph_builder.add_node("response_generator", generate_response)
        # graph_builder.add_node("response_generator", bad_sql_generator)
        graph_builder.add_edge("pnl_retriever", "response_generator")
        graph_builder.add_edge("balance_sheet_retriever", "response_generator")
        graph_builder.add_edge("stock_retriever", "response_generator")
        
        graph_builder.set_entry_point("classifier")
        graph_builder.set_finish_point("response_generator")

        # Compile the graph
        graph = graph_builder.compile()
        
        response = await graph.ainvoke({
            "messages": [HumanMessage(content="Please calculate our PNL on Apple stock. Refer to table information from documents provided.")],
            "category": None,
        })
    
        print(f"Response: {response['messages'][-1].content}")

if __name__ == "__main__":
    asyncio.run(main())
```

## Sample Query 2:
Can you add Judgment tracing to my file?

## Example of Modified Code after Query 2:
```
from langchain_openai import ChatOpenAI
import asyncio
import os

import chromadb
from chromadb.utils import embedding_functions

from vectordbdocs import financial_data

from typing import Optional
from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage, ChatMessage
from typing_extensions import TypedDict
from langgraph.graph import StateGraph

from judgeval.common.tracer import Tracer
from judgeval.integrations.langgraph import JudgevalCallbackHandler
from judgeval.scorers import AnswerCorrectnessScorer, FaithfulnessScorer
from judgeval.data import Example



judgment = Tracer(project_name="FINANCIAL_AGENT")

# Define our state type
class AgentState(TypedDict):
    messages: list[BaseMessage]
    category: Optional[str]
    documents: Optional[str]
    
def populate_vector_db(collection, raw_data):
    """
    Populate the vector DB with financial information.
    """
    for data in raw_data:
        collection.add(
            documents=[data['information']],
            metadatas=[{"category": data['category']}],
            ids=[f"category_{data['category'].lower().replace(' ', '_')}_{os.urandom(4).hex()}"]
        )

# Define a ChromaDB collection for document storage
client = chromadb.Client()
collection = client.get_or_create_collection(
    name="financial_docs",
    embedding_function=embedding_functions.OpenAIEmbeddingFunction(api_key=os.getenv("OPENAI_API_KEY"))
)

populate_vector_db(collection, financial_data)

@judgment.observe(name="pnl_retriever", span_type="retriever")
def pnl_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "pnl"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

@judgment.observe(name="balance_sheet_retriever", span_type="retriever")
def balance_sheet_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "balance_sheets"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

@judgment.observe(name="stock_retriever", span_type="retriever")
def stock_retriever(state: AgentState) -> AgentState:
    query = state["messages"][-1].content
    results = collection.query(
        query_texts=[query],
        where={"category": "stocks"},
        n_results=3
    )
    documents = []
    for document in results["documents"]:
        documents += document

    return {"messages": state["messages"], "documents": documents}

@judgment.observe(name="bad_classifier", span_type="llm")
async def bad_classifier(state: AgentState) -> AgentState:
    return {"messages": state["messages"], "category": "stocks"}

@judgment.observe(name="bad_classify")
async def bad_classify(state: AgentState) -> AgentState:
    category = await bad_classifier(state)
    
    example = Example(
        input=state["messages"][-1].content,
        actual_output=category["category"],
        expected_output="pnl"
    )
    judgment.async_evaluate(
        scorers=[AnswerCorrectnessScorer(threshold=1)],
        example=example,
        model="gpt-4.1"
    )
    
    return {"messages": state["messages"], "category": category["category"]}

@judgment.observe(name="bad_sql_generator", span_type="llm")
async def bad_sql_generator(state: AgentState) -> AgentState:
    ACTUAL_OUTPUT = "SELECT * FROM pnl WHERE stock_symbol = 'GOOGL'"
    
    example = Example(
        input=state["messages"][-1].content,
        actual_output=ACTUAL_OUTPUT,
        retrieval_context=state.get("documents", []),
        expected_output="""
        SELECT 
            SUM(CASE 
                WHEN transaction_type = 'sell' THEN (price_per_share - (SELECT price_per_share FROM stock_transactions WHERE stock_symbol = 'GOOGL' AND transaction_type = 'buy' LIMIT 1)) * quantity 
                ELSE 0 
            END) AS realized_pnl
        FROM 
            stock_transactions
        WHERE 
            stock_symbol = 'META';
        """
    )
    judgment.async_evaluate(
        scorers=[AnswerCorrectnessScorer(threshold=1), FaithfulnessScorer(threshold=1)],
        example=example,
        model="gpt-4.1"
    )
    return {"messages": state["messages"] + [ChatMessage(content=ACTUAL_OUTPUT, role="text2sql")]}

# Create the classifier node with a system prompt
@judgment.observe(name="classify")
async def classify(state: AgentState) -> AgentState:
    messages = state["messages"]
    input_msg = [
        SystemMessage(content="""You are a financial query classifier. Your job is to classify user queries into one of three categories:
        - 'pnl' for Profit and Loss related queries
        - 'balance_sheets' for Balance Sheet related queries
        - 'stocks' for Stock market related queries
        
        Respond ONLY with the category name in lowercase, nothing else."""),
        *messages
    ]
    
    response = ChatOpenAI(model="gpt-4.1", temperature=0).invoke(
        input=input_msg
    )
    
    example = Example(
        input=str(input_msg),
        actual_output=response.content,
        expected_output="pnl"
    )
    judgment.async_evaluate(
        scorers=[AnswerCorrectnessScorer(threshold=1)],
        example=example,
        model="gpt-4.1"
    )

    return {"messages": state["messages"], "category": response.content}

# Add router node to direct flow based on classification
def router(state: AgentState) -> str:
    return state["category"]

@judgment.observe(name="generate_response")
async def generate_response(state: AgentState) -> AgentState:
    messages = state["messages"]
    documents = state.get("documents", "")

    OUTPUT = """
        SELECT 
            stock_symbol,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity ELSE -quantity END) AS total_shares,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity * price_per_share ELSE -quantity * price_per_share END) AS total_cost,
            MAX(CASE WHEN transaction_type = 'buy' THEN price_per_share END) AS current_market_price
        FROM 
            stock_transactions
        WHERE 
            stock_symbol = 'META'
        GROUP BY 
            stock_symbol;
        """
    
    example = Example(
        input=messages[-1].content,
        actual_output=OUTPUT,
        retrieval_context=documents,
        expected_output="""
        SELECT 
            stock_symbol,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity ELSE -quantity END) AS total_shares,
            SUM(CASE WHEN transaction_type = 'buy' THEN quantity * price_per_share ELSE -quantity * price_per_share END) AS total_cost,
            MAX(CASE WHEN transaction_type = 'buy' THEN price_per_share END) AS current_market_price
        FROM 
            stock_transactions
        WHERE 
            stock_symbol = 'META'
        GROUP BY 
            stock_symbol;
        """
    )
    judgment.async_evaluate(
        scorers=[AnswerCorrectnessScorer(threshold=1), FaithfulnessScorer(threshold=1)],
        example=example,
        model="gpt-4.1"
    )

    return {"messages": messages + [ChatMessage(content=OUTPUT, role="text2sql")], "documents": documents}

async def main():
    with judgment.trace(
        "run_1",
        project_name="FINANCIAL_AGENT",
        overwrite=True
    ) as trace:

        # Initialize the graph
        graph_builder = StateGraph(AgentState)

        # Add classifier node
        # For failure test, pass in bad_classifier
        graph_builder.add_node("classifier", classify)
        # graph_builder.add_node("classifier", bad_classify)
        
        # Add conditional edges based on classification
        graph_builder.add_conditional_edges(
            "classifier",
            router,
            {
                "pnl": "pnl_retriever",
                "balance_sheets": "balance_sheet_retriever",
                "stocks": "stock_retriever"
            }
        )
        
        # Add retriever nodes (placeholder functions for now)
        graph_builder.add_node("pnl_retriever", pnl_retriever)
        graph_builder.add_node("balance_sheet_retriever", balance_sheet_retriever)
        graph_builder.add_node("stock_retriever", stock_retriever)

        # Add edges from retrievers to response generator
        graph_builder.add_node("response_generator", generate_response)
        # graph_builder.add_node("response_generator", bad_sql_generator)
        graph_builder.add_edge("pnl_retriever", "response_generator")
        graph_builder.add_edge("balance_sheet_retriever", "response_generator")
        graph_builder.add_edge("stock_retriever", "response_generator")
        
        graph_builder.set_entry_point("classifier")
        graph_builder.set_finish_point("response_generator")

        # Compile the graph
        graph = graph_builder.compile()
        
        handler = JudgevalCallbackHandler(trace)

        response = await graph.ainvoke({
            "messages": [HumanMessage(content="Please calculate our PNL on Apple stock. Refer to table information from documents provided.")],
            "category": None,
        }, config=dict(callbacks=[handler]))
        trace.save()
    
        print(f"Response: {response['messages'][-1].content}")

if __name__ == "__main__":
    asyncio.run(main())
```


# Official Judgment SDK Documentation
---
title: JudgmentClient
description: Complete reference for the JudgmentClient Python SDK
---

import { APIEndpoint } from '@/components/api';

# JudgmentClient API Reference

The JudgmentClient is your primary interface for interacting with the Judgment platform. It provides methods for running evaluations, managing datasets, handling traces, and more.

## Authentication

Set up your credentials using environment variables:

```bash
export JUDGMENT_API_KEY="your_api_key_here"  
export JUDGMENT_ORG_ID="your_organization_id_here"
```

<APIEndpoint
  title="Initialize Client"
  description="Initialize a JudgmentClient object."
  parameters={[
    {
      name: "judgment_api_key",
      type: "str",
      required: false,
      description: "Recommended - set using the JUDGMENT_API_KEY environment variable",
    },
    {
      name: "judgment_org_id", 
      type: "str",
      required: false,
      description: "Recommended - set using the JUDGMENT_ORG_ID environment variable",
    },
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `from judgeval import JudgmentClient

  client = JudgmentClient()`
    }
  ]}
/>

<APIEndpoint
  title="client.run_evaluation()"
  description="Execute an evaluation of examples using one or more scorers to measure performance and quality of your AI models."
  parameters={[
    {
      name: "examples",
      type: "List[Example]",
      required: true,
      description: "The examples to evaluate against your AI model",
      example: "[Example(...)]",
    },
    {
      name: "scorers", 
      type: "List[APIJudgmentScorer]",
      required: true,
      description: "List of scorers to use for evaluation",
      example: "[APIJudgmentScorer(...)]"
    },
    {
      name: "model",
      type: "str",
      required: false,
      description: "Model used as judge when using LLM as a Judge",
      example: '"gpt-4o-mini"',
      default: "gpt-4.1"
    },
    {
      name: "project_name",
      type: "str", 
      required: false,
      description: "Name of the project for organization",
      example: '"my_qa_project"',
      default: "default_project"
    },
    {
      name: "eval_run_name",
      type: "str",
      required: false, 
      description: "Unique name for this evaluation run",
      example: '"experiment_v1"',
      default: "default_eval_run"
    },
    {
      name: "override",
      type: "bool",
      required: false,
      description: "Whether to override an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "append",
      type: "bool",
      required: false,
      description: "Whether to append to an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "async_execution",
      type: "bool",
      required: false,
      description: "Whether to execute the evaluation asynchronously",
      default: "False"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `from judgeval import JudgmentClient
from judgeval.data import Example

client = JudgmentClient()

examples = [
    Example(
        input="What is the capital of France?",
        actual_output="Paris is the capital of France.",
        expected_output="Paris"
    )
]

from judgeval.scorers import AnswerRelevancyScorer
results = client.run_evaluation(
    examples=examples,
    scorers=[AnswerRelevancyScorer(threshold=0.9)],
    project_name="geography_qa"
)`
    }
  ]}
  responses={[
    {
      status: 200,
      description: "List[ScoringResult]",
      example: `[
      ScoringResult(
        success=False, 
        scorers_data=[ScorerData(...)], 
        name=None, 
        data_object=Example(...), 
        trace_id=None, 
        run_duration=None, 
        evaluation_cost=None
      )
  ]`
    }
  ]}
/>

<APIEndpoint
  title="client.run_trace_evaluation()"
  description="Execute trace-based evaluation using function calls and tracing to evaluate agent behavior and execution flows."
  parameters={[
    {
      name: "scorers",
      type: "List[APIJudgmentScorer]", 
      required: true,
      description: "List of scorers to use for evaluation",
      example: "[APIJudgmentScorer(...)]"
    },
    {
      name: "examples",
      type: "List[Example]",
      required: false,
      description: "Examples to run through the function (required if using function)",
      example: "[Example(...)]"
    },
    {
      name: "function",
      type: "Callable",
      required: false,
      description: "Function to execute and trace for evaluation"
    },
    {
      name: "tracer",
      type: "Union[Tracer, BaseCallbackHandler]",
      required: false,
      description: "The tracer object used in tracing your agent"
    },
    {
      name: "traces",
      type: "List[Trace]",
      required: false,
      description: "Pre-existing traces to evaluate instead of generating new ones"
    },
    {
      name: "project_name", 
      type: "str",
      required: false,
      description: "Name of the project for organization",
      default: "default_project",
      example: '"agent_evaluation"'
    },
    {
      name: "eval_run_name",
      type: "str",
      required: false,
      description: "Unique name for this trace evaluation run", 
      default: "default_eval_run",
      example: '"agent_trace_v1"'
    },
    {
      name: "override",
      type: "bool",
      required: false,
      description: "Whether to override an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "append",
      type: "bool",
      required: false,
      description: "Whether to append to an existing evaluation run with the same name",
      default: "False"
    },
  ]}
  note="You either need to provide 'examples', 'function' and 'tracer' OR 'traces'"
  codeExamples={[
    {
      language: "python",
      label: "Python", 
      code: `
  from judgeval.tracer import Tracer
  tracer = Tracer()

  def my_agent_function(query: str) -> str:
    """Your agent function to be traced and evaluated"""
    response = f"Processing query: {query}"
    return response

examples = [
    Example(
        input={"query": "What is the weather like?"},
        expected_output="I'll help you check the weather."
    )
]

from judgeval.scorers import ToolOrderScorer    
results = client.run_trace_evaluation(
    scorers=[ToolOrderScorer()],
    examples=examples,
    function=my_agent_function,
    tracer=tracer,
    project_name="agent_evaluation"
)`
    }
  ]}
  responses={[
    {
      status: 200,
      description: "List[ScoringResult]",
      example: `[
      ScoringResult(
        success=False, 
        scorers_data=[ScorerData(...)], 
        name=None, 
        data_object=Example(...), 
        trace_id=None, 
        run_duration=None, 
        evaluation_cost=None
      )
  ]`
    }
  ]}
/>


<APIEndpoint
  title="client.create_dataset()"
  description="Create a new evaluation dataset for storage and reuse across multiple evaluation runs."
  parameters={[
  ]}

/>

<APIEndpoint
  title="client.push_dataset()"
  description="Upload an evaluation dataset to the Judgment platform for storage and reuse across multiple evaluation runs."
  parameters={[
    {
      name: "alias",
      type: "str",
      required: true,
      description: "Unique name for the dataset within the project",
      example: '"qa_dataset_v1"'
    },
    {
      name: "dataset",
      type: "EvalDataset", 
      required: true,
      description: "Dataset object containing examples and metadata"
    },
    {
      name: "project_name",
      type: "str",
      required: true,
      description: "Project name where the dataset will be stored",
      example: '"question_answering"'
    },
    {
      name: "overwrite",
      type: "bool",
      required: false,
      description: "Whether to overwrite existing dataset with same alias",
      default: "False"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `dataset = client.create_dataset()

dataset.add_examples([
    Example(
        input="What is machine learning?",
        actual_output="Machine learning is a subset of AI...",
        expected_output="Machine learning is a method of data analysis..."
    )
])

success = client.push_dataset(
    alias="ml_qa_dataset_v2",
    dataset=dataset,
    project_name="machine_learning_qa",
    overwrite=True
)`
    }
  ]}
  responses={[
    {
      status: 200,
      description: "bool",
      example: `True`
    }
  ]}
/>

<APIEndpoint
  title="client.pull_dataset()"
  description="Retrieve a saved dataset from the Judgment platform to use in evaluations or analysis."
  parameters={[
    {
      name: "alias",
      type: "str",
      required: true,
      description: "The alias of the dataset to retrieve",
      example: '"qa_dataset_v1"'
    },
    {
      name: "project_name",
      type: "str",
      required: true,
      description: "Project name where the dataset is stored",
      example: '"question_answering"'
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `dataset = client.pull_dataset(
    alias="qa_dataset_v1",
    project_name="question_answering"
)

print(f"Dataset has {len(dataset.examples)} examples")

results = client.run_evaluation(
    examples=dataset.examples,
    scorers=my_scorers,
    project_name="question_answering"
)`
    }
  ]}
  responses={[
    {
      status: 200,
      description: "EvalDataset",
      example: `EvalDataset(
  examples=[
    Example(
      input="What is the capital of France?",
      actual_output="Paris",
      expected_output="Paris"
    )
  ],
  metadata={
    "created_at": "2024-01-15T10:30:00Z",
    "examples_count": 1
  }
)`
    }
  ]}
/>

<APIEndpoint
  title="client.append_dataset()"
  description="Append examples to an existing dataset."
  parameters={[
    {
      name: "alias",
      type: "str",
      required: true,
      description: "Unique name for the dataset within the project",
      example: '"qa_dataset_v1"'
    },
    {
      name: "examples",
      type: "List[Example]", 
      required: true,
      description: "List of examples to append to the dataset",
      example: "[Example(...)]"
    },
    {
      name: "project_name",
      type: "str",
      required: true,
      description: "Project name where the dataset will be stored",
      example: '"question_answering"'
    },
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `dataset = client.create_dataset()

dataset = client.pull_dataset(
    alias="qa_dataset_v1",
    project_name="question_answering"
  )

examples = [
    Example(
      input="What is the capital of France?",
      actual_output="Paris",
      expected_output="Paris"
    )
  ]

results = client.append_dataset(
    alias="qa_dataset_v1",
    examples=examples,
    project_name="question_answering"
  )`
    }
  ]}
  responses={[
    {
      status: 200,
      description: "bool",
      example: `True`
    }
  ]}
/>

<APIEndpoint
  title="client.assert_test()"
  description="Runs evaluations as unit tests, raising an exception if the score falls below the defined threshold."
  parameters={[
    {
      name: "examples",
      type: "List[Example]",
      required: true,
      description: "The examples to evaluate against your AI model",
      example: "[Example(...)]",
    },
    {
      name: "scorers", 
      type: "List[APIJudgmentScorer]",
      required: true,
      description: "List of scorers to use for evaluation",
      example: "[APIJudgmentScorer(...)]"
    },
    {
      name: "model",
      type: "str",
      required: false,
      description: "Model used as judge when using LLM as a Judge",
      example: '"gpt-4o-mini"',
      default: "gpt-4.1"
    },
    {
      name: "project_name",
      type: "str", 
      required: false,
      description: "Name of the project for organization",
      example: '"my_qa_project"',
      default: "default_project"
    },
    {
      name: "eval_run_name",
      type: "str",
      required: false, 
      description: "Unique name for this evaluation run",
      example: '"experiment_v1"',
      default: "default_eval_run"
    },
    {
      name: "override",
      type: "bool",
      required: false,
      description: "Whether to override an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "append",
      type: "bool",
      required: false,
      description: "Whether to append to an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "async_execution",
      type: "bool",
      required: false,
      description: "Whether to execute the evaluation asynchronously",
      default: "False"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `from judgeval import JudgmentClient

from judgeval import JudgmentClient
from judgeval.data import Example
from judgeval.scorers import FaithfulnessScorer

client = JudgmentClient()

example = Example(
    input="What if these shoes don't fit?",
    actual_output="We offer a 30-day full refund at no extra cost.",
    retrieval_context=["All customers are eligible for a 44 day full refund at no extra cost."],
)

scorer = FaithfulnessScorer(threshold=0.5)
client.assert_test(
    examples=[example],
    scorers=[scorer],
)`
    }
  ]}
/>


<APIEndpoint
  title="client.assert_trace_test()"
  description="Runs trace-based evaluations as unit tests, raising an exception if the score falls below the defined threshold."
  parameters={[
    {
      name: "scorers",
      type: "List[APIJudgmentScorer]", 
      required: true,
      description: "List of scorers to use for evaluation",
      example: "[APIJudgmentScorer(...)]"
    },
    {
      name: "examples",
      type: "List[Example]",
      required: false,
      description: "Examples to run through the function (required if using function)",
      example: "[Example(...)]"
    },
    {
      name: "function",
      type: "Callable",
      required: false,
      description: "Function to execute and trace for evaluation"
    },
    {
      name: "tracer",
      type: "Union[Tracer, BaseCallbackHandler]",
      required: false,
      description: "The tracer object used in tracing your agent"
    },
    {
      name: "traces",
      type: "List[Trace]",
      required: false,
      description: "Pre-existing traces to evaluate instead of generating new ones"
    },
    {
      name: "project_name", 
      type: "str",
      required: false,
      description: "Name of the project for organization",
      default: "default_project",
      example: '"agent_evaluation"'
    },
    {
      name: "eval_run_name",
      type: "str",
      required: false,
      description: "Unique name for this trace evaluation run", 
      default: "default_eval_run",
      example: '"agent_trace_v1"'
    },
    {
      name: "override",
      type: "bool",
      required: false,
      description: "Whether to override an existing evaluation run with the same name",
      default: "False"
    },
    {
      name: "append",
      type: "bool",
      required: false,
      description: "Whether to append to an existing evaluation run with the same name",
      default: "False"
    },
  ]}
  note="You either need to provide 'examples', 'function' and 'tracer' OR 'traces'"
  codeExamples={[
    {
      language: "python",
      label: "Python", 
      code: `
  from judgeval.tracer import Tracer
  tracer = Tracer()

  def my_agent_function(query: str) -> str:
    """Your agent function to be traced and evaluated"""
    response = f"Processing query: {query}"
    return response

examples = [
    Example(
        input={"query": "What is the weather like?"},
        expected_output="I'll help you check the weather."
    )
]

from judgeval.scorers import ToolOrderScorer    
results = client.assert_trace_test(
    scorers=[ToolOrderScorer()],
    examples=examples,
    function=my_agent_function,
    tracer=tracer,
    project_name="agent_evaluation"
)`
    }
  ]}
/>
## Error Handling

The JudgmentClient raises specific exceptions for different error conditions:

<div className="overflow-x-auto">
  <table className="min-w-full">
    <thead>
      <tr className="border-b border-gray-200 dark:border-gray-700">
        <th className="text-left py-3 text-sm font-medium text-gray-900 dark:text-gray-100">Exception</th>
        <th className="text-left py-3 text-sm font-medium text-gray-900 dark:text-gray-100">Description</th>
      </tr>
    </thead>
    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">JudgmentAPIError</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">API request failures or server errors</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">ValueError</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Invalid parameters or configuration</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">FileNotFoundError</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Missing test files or datasets</td>
      </tr>
    </tbody>
  </table>
</div>

```python
from judgeval.common.exceptions import JudgmentAPIError

try:
    results = client.run_evaluation(examples, scorers)
except JudgmentAPIError as e:
    print(f"API Error: {e}")
except ValueError as e:
    print(f"Invalid parameters: {e}")
``` 

---
title: Tracer
description: Complete reference for the Tracer Python SDK
---

import { APIEndpoint } from '@/components/api';

# Tracer API Reference

The Tracer is your primary interface for adding observability to your AI agents. It provides methods for tracing function execution, evaluating performance, and collecting comprehensive environment interaction data.


<APIEndpoint
  title="Initializing Tracer"
  description="Initialize a Tracer object."
  parameters={[
    {
      name: "api_key",
      type: "str",
      required: false,
      description: "Recommended - set using the JUDGMENT_API_KEY environment variable",
    },
    {
      name: "organization_id", 
      type: "str",
      required: false,
      description: "Recommended - set using the JUDGMENT_ORG_ID environment variable",
    },
    {
      name: "project_name",
      type: "str",
      required: false,
      description: "Optional project name override",
      default: "default_project"
    },
    {
      name: "deep_tracing",
      type: "bool",
      required: false,
      description: "Whether to enable deep tracing, which will trace all nested function calls without the need to decorate each function.",
      default: "False"
    },
    {
      name: "enable_monitoring",
      type: "bool",
      required: false,
      description: "If you need to toggle monitoring on and off",
      default: "True"
    },
    {
      name: "enable_evaluations",
      type: "bool",
      required: false,
      description: "If you need to toggle evaluations on and off for async_evaluate()",
      default: "True"
    },
    {
      name: "use_s3",
      type: "bool",
      required: false,
      description: "Whether to use S3 for storage",
      default: "False"
    },
    {
      name: "s3_bucket_name",
      type: "str",
      required: false,
      description: "Name of the S3 bucket to use",
      default: "None"
    },
    {
      name: "s3_aws_access_key_id",
      type: "str",
      required: false,
      description: "AWS access key ID for S3",
      default: "None"
    },
    {
      name: "s3_aws_secret_access_key",
      type: "str",
      required: false,
      description: "AWS secret access key for S3",
      default: "None"
    },
    {
      name: "s3_region_name",
      type: "str",
      required: false,
      description: "AWS region name for S3",
      default: "None"
    },
    {
      name: "trace_across_async_contexts",
      type: "bool",
      required: false,
      description: "Whether to trace across async contexts",
      default: "False"
    },
    {
      name: "span_batch_size",
      type: "int",
      required: false,
      description: "Number of spans to batch before sending",
      default: "50"
    },
    {
      name: "span_flush_interval",
      type: "float",
      required: false,
      description: "Time in seconds between automatic flushes",
      default: "1.0"
    },
    {
      name: "span_num_workers",
      type: "int",
      required: false,
      description: "Number of worker threads for span processing",
      default: "10"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Python",
      code: `from judgeval import Tracer

  tracer = Tracer()`
    }
  ]}
/>


<APIEndpoint
  title="tracer.observe()"
  description="Decorator to trace function execution with detailed entry/exit information."
  parameters={[
    {
      name: "func",
      type: "Callable",
      required: true,
      description: "The function to decorate (automatically provided when used as decorator)",
    },
    {
      name: "name",
      type: "str",
      required: false,
      description: "Optional custom name for the span (defaults to function name)",
      default: "None",
      example: '"custom_span_name"'
    },
    {
        name: "span_type",
        type: "str",
        required: false,
        description: "Label for the span. Use 'tool' for functions that should be tracked and exported as agent tools",
        default: '"span"',
        example: '"tool"'
    },
    {
      name: "project_name",
      type: "str",
      required: false,
      description: "Optional project name override",
      default: "None",
      example: '"my_project"'
    },
    {
      name: "overwrite",
      type: "bool",
      required: false,
      description: "Whether to overwrite existing traces",
      default: "False",
      example: "False"
    },
    {
      name: "deep_tracing",
      type: "bool",
      required: false,
      description: "Whether to enable deep tracing for this function and all nested calls. If None, uses the tracer's default setting.",
      default: "False",
      example: "True"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Function Decorator",
      code: `from openai import OpenAI
from judgeval.common.tracer import Tracer

client = OpenAI()
tracer = Tracer(project_name='simple-agent', deep_tracing=False)
      
@tracer.observe(span_type="tool")
def search_web(query):
    return f"Results for: {query}"

@tracer.observe(span_type="retriever")
def get_database(query):
    return f"Database results for: {query}"

@tracer.observe(span_type="function")
def run_agent(user_query):
    # Use tools based on query
    if "database" in user_query:
        info = get_database(user_query)
    else:
        info = search_web(user_query)
    
    prompt = f"Context: {info}, Question: {user_query}"
    
    # Generate response
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": prompt"}]
    )
    return response.choices[0].message.content`
    }
  ]}
/>

<APIEndpoint
  title="tracer.observe_tools()"
  description="Automatically adds @observe(span_type='tool') to all methods in a class."
  parameters={[
    {
      name: "cls",
      type: "type",
      required: true,
      description: "The class to decorate (automatically provided when used as decorator)",
    },
    {
      name: "exclude_methods",
      type: "List[str]",
      required: false,
      description: "List of method names to skip decorating. Defaults to common magic methods",
      default: '["__init__", "__new__", "__del__", "__str__", "__repr__"]',
      example: '["__init__", "private_method"]'
    },
    {
      name: "include_private",
      type: "bool",
      required: false,
      description: "Whether to decorate methods starting with underscore. Defaults to False",
      default: "False",
      example: "False"
    },
    {
      name: "warn_on_double_decoration",
      type: "bool",
      required: false,
      description: "Whether to print warnings when skipping already-decorated methods. Defaults to True",
      default: "True",
      example: "True"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Class Decorator",
      code: `@tracer.observe_tools()
class SearchTool:
    def search_web(self, query):
        return f"Web results for: {query}"
    
    def search_docs(self, query):
        return f"Document results for: {query}"
        
    def _private_helper(self):
        # This won't be traced by default
        return "helper"

class MyAgent(SearchTool):
    @tracer.observe(span_type="function")
    def run_agent(self, user_query):
        # Use inherited tools
        if "docs" in user_query:
            info = self.search_docs(user_query)
        else:
            info = self.search_web(user_query)
        
        return f"Agent response based on: {info}"

# All public methods from SearchTool are automatically traced
agent = MyAgent()
result = agent.run_agent("Find web results")  # Both calls are traced`
    }
  ]}
/>

<APIEndpoint
  title="wrap()"
  description="Wraps an API client to add tracing capabilities. Supports OpenAI, Together, Anthropic, and Google GenAI clients. Patches both '.create' and Anthropic's '.stream' methods using a wrapper class."
  parameters={[
    {
      name: "client",
      type: "Any",
      required: true,
      description: "API client to wrap (OpenAI, Anthropic, Together, Google GenAI)",
      example: "OpenAI()"
    },
    {
      name: "trace_across_async_contexts",
      type: "bool",
      required: false,
      description: "Whether to trace across async contexts",
      default: "False",
      example: "True"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Auto-trace LLM Calls",
      code: `from openai import OpenAI
from judgeval import wrap

client = OpenAI()
wrapped_client = wrap(client)

# All API calls are now automatically traced
response = wrapped_client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello"}]
)

# Streaming calls are also traced
stream = wrapped_client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)`
    }
  ]}
/>

## Evaluation & Logging

<APIEndpoint
  title="tracer.async_evaluate()"
  description="Runs quality evaluations on the current trace/span using specified scorers. You can provide either an Example object or individual evaluation parameters (input, actual_output, etc.)."
  parameters={[
    {
      name: "scorers",
      type: "List[Union[APIJudgmentScorer, JudgevalScorer]]",
      required: true,
      description: "List of evaluation scorers to run",
      example: "[FaithfulnessScorer()]"
    },
    {
      name: "example",
      type: "Example",
      required: false,
      description: "Example object containing evaluation data",
      default: "None"
    },
    {
      name: "input",
      type: "str",
      required: false,
      description: "Input text to evaluate",
      default: "None",
      example: '"What is the capital of France?"'
    },
    {
      name: "actual_output",
      type: "Union[str, List[str]]",
      required: false,
      description: "Actual output from your system",
      default: "None",
      example: '"Paris is the capital of France"'
    },
    {
      name: "expected_output",
      type: "Union[str, List[str]]",
      required: false,
      description: "Expected/reference output",
      default: "None",
      example: '"Paris"'
    },
    {
      name: "context",
      type: "List[str]",
      required: false,
      description: "Context information for evaluation",
      default: "None",
      example: '["France is a country in Europe"]'
    },
    {
      name: "retrieval_context",
      type: "List[str]",
      required: false,
      description: "Retrieved documents for RAG evaluation",
      default: "None"
    },
    {
      name: "tools_called",
      type: "List[str]",
      required: false,
      description: "Tools that were actually called",
      default: "None",
      example: '["search", "calculate"]'
    },
    {
      name: "expected_tools",
      type: "List[str]",
      required: false,
      description: "Tools that should have been called",
      default: "None",
      example: '["search"]'
    },
    {
      name: "additional_metadata",
      type: "Dict[str, Any]",
      required: false,
      description: "Additional metadata for the evaluation",
      default: "None"
    },
    {
      name: "model",
      type: "str",
      required: false,
      description: "Model name for evaluation",
      default: "None",
      example: '"gpt-4.1"'
    },
    {
      name: "span_id",
      type: "str",
      required: false,
      description: "Specific span ID to attach evaluation to",
      default: "None"
    },
    {
      name: "log_results",
      type: "bool",
      required: false,
      description: "Whether to log results to the Judgment platform",
      default: "True"
    }
  ]}
  codeExamples={[
     {
    language: "python",
    label: "Using Example Object", 
    code: `from judgeval.scorers import FaithfulnessScorer
from judgeval.data import Example

answer = "Paris is the capital of France"

# Create example object
example = Example(
    input=question,
    actual_output=answer,
    expected_output="Paris",
    context=["France is a country in Europe"]
)

# Evaluate using Example
tracer.async_evaluate(
    scorers=[FaithfulnessScorer()],
    example=example
)

return answer`
  },
  {
    language: "python",
    label: "Individual Parameters",
    code: `from judgeval.scorers import FaithfulnessScorer

answer = "Paris is the capital of France"

# Evaluate the current span
tracer.async_evaluate(
    scorers=[FaithfulnessScorer()],
    input=question,
    actual_output=answer,
    expected_output="Paris",
    context=["France is a country in Europe"]
)

return answer`
  }
]}
/>

<APIEndpoint
  title="tracer.log()"
  description="Log a message with the current span context"
  parameters={[
    {
      name: "msg",
      type: "str",
      required: true,
      description: "Message to log",
      example: '"Starting web search"'
    },
    {
      name: "label",
      type: "str",
      required: false,
      description: "Label/category for the log entry",
      default: '"log"',
      example: '"debug"'
    },
    {
      name: "score",
      type: "int",
      required: false,
      description: "Numeric score associated with the log",
      default: "1",
      example: "1"
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Logging Within Traced Functions",
      code: `def search_process(query):
    tracer.log("Starting search", label="info")
    
    try:
        results = perform_search(query)
        tracer.log(f"Found {len(results)} results", label="success", score=1)
        return results
    except Exception as e:
        tracer.log(f"Search failed: {e}", label="error", score=0)
        raise`
    }
  ]}
/>

## Metadata & Organization

<APIEndpoint
  title="tracer.set_metadata()"
  description="Set metadata for the current trace."
  parameters={[
    {
      name: "**kwargs",
      type: "Any",
      required: true,
      description: "Key-value pairs to set as metadata for the current trace. Each keyword argument becomes a metadata field.",
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Adding Trace Metadata",
      code: `def process_user_request(user_id, request):
    # Add metadata to the current trace
    tracer.set_metadata(
        user_id=user_id,
        environment="production",
        experiment_id="exp_456",
        version="1.2.3"
    )
    
    return handle_request(request)`
    }
  ]}
/>

<APIEndpoint
  title="tracer.set_customer_id()"
  description="Set the customer ID for the current trace."
  parameters={[
    {
      name: "customer_id",
      type: "str",
      required: true,
      description: "The customer ID to set",
      example: '"customer_123"'
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Customer Tracking",
      code: `def handle_customer_request(customer_id, request):
    tracer.set_customer_id(customer_id)
    return process_request(request)`
    }
  ]}
/>

<APIEndpoint
  title="tracer.set_tags()"
  description="Set the tags for the current trace."
  parameters={[
    {
      name: "tags",
      type: "List[str]",
      required: true,
      description: "List of tags to set",
      example: '["experiment", "production", "v2"]'
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "Tagging Traces",
      code: `def experimental_feature(data):
    tracer.set_tags(["experiment", "feature_v2", "production"])
    return new_algorithm(data)`
    }
  ]}
/>

## Advanced Features

<APIEndpoint
  title="tracer.identify()"
  description="Class decorator for multi-agent systems that assigns a unique identifier to agent and enables tracking of their internal state variables. Essential for monitoring and debugging complex multi-agent workflows where multiple agents interact and you need to track each agent's behavior and state separately."  parameters={[
    {
      name: "identifier",
      type: "str",
      required: true,
      description: "The identifier to associate with the decorated class. This will be used as the instance name in traces.",
      example: '"user_agent"'
    },
    {
      name: "track_state",
      type: "bool",
      required: false,
      description: "Whether to automatically capture the state (attributes) of instances before and after function execution. Defaults to False.",
      default: "False",
      example: "True"
    },
    {
      name: "track_attributes",
      type: "List[str]",
      required: false,
      description: "Optional list of specific attribute names to track. If None, all non-private attributes (not starting with '_') will be tracked when track_state=True.",
      default: "None",
      example: '["memory", "goals"]'
    },
    {
      name: "field_mappings",
      type: "Dict[str, str]",
      required: false,
      description: "Optional dictionary mapping internal attribute names to display names in the captured state. For example: {\"system_prompt\": \"instructions\"} will capture the 'instructions' attribute as 'system_prompt' in the state.",
      default: "None",
      example: '{"system_prompt": "instructions"}'
    }
  ]}
  codeExamples={[
    {
      language: "python",
      label: "State Tracking",
      code: `@judgment.identify(identifier="name", track_state=True)
class Agent(AgentTools, AgentBase):
    """An AI agent."""
    
    def __init__(self):
        self.name = name
        
        self.function_map = {
            "func": self.function,
            ...
        }

    @judgment.observe(span_type="function")
    def process_request(self, user_request):
        """Process a user request using all available tools."""
        pass`
    }
  ]}
/>

## Current Span Access

<APIEndpoint
  title="tracer.get_current_span()"
  description="Returns the current span object for direct access to span properties and methods, useful for debugging and inspection."
  fullWidth={true}
/>

### Available Span Properties

The current span object provides these properties for inspection and debugging:

<div className="overflow-x-auto">
  <table className="min-w-full">
    <thead>
      <tr className="border-b border-gray-200 dark:border-gray-700">
        <th className="text-left py-3 text-sm font-medium text-gray-900 dark:text-gray-100">Property</th>
        <th className="text-left py-3 text-sm font-medium text-gray-900 dark:text-gray-100">Type</th>
        <th className="text-left py-3 text-sm font-medium text-gray-900 dark:text-gray-100">Description</th>
      </tr>
    </thead>
    <tbody className="divide-y divide-gray-200 dark:divide-gray-700">
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">span_id</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Unique identifier for this span</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">trace_id</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">ID of the parent trace</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">function</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Name of the function being traced</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">span_type</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Type of span ("span", "tool", "llm", "evaluation", "chain")</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">inputs</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">dict</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Input parameters for this span</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">output</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Any</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Output/result of the span execution</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">duration</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">float</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Execution time in seconds</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">depth</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">int</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Nesting depth in the trace hierarchy</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-400">parent_span_id</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">ID of the parent span (if nested)</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">agent_name</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">str | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Name of the agent executing this span</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">has_evaluation</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">bool</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Whether this span has evaluation runs</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">evaluation_runs</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">List[EvaluationRun]</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">List of evaluations run on this span</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">usage</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">TraceUsage | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Token usage and cost information</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">error</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Dict[str, Any] | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Error information if span failed</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">state_before</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">dict | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Agent state before execution</td>
      </tr>
      <tr>
        <td className="py-3 text-sm font-mono text-gray-900 dark:text-gray-100">state_after</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">dict | None</td>
        <td className="py-3 text-sm text-gray-600 dark:text-gray-400">Agent state after execution</td>
      </tr>
    </tbody>
  </table>
</div>

### Example Usage

```python
@tracer.observe(span_type="tool")
def debug_tool(query):
    span = tracer.get_current_span()
    if span:
        # Access span properties for debugging
        print(f"🔧 Executing {span.function} (ID: {span.span_id})")
        print(f"📊 Depth: {span.depth}, Type: {span.span_type}")
        print(f"📥 Inputs: {span.inputs}")
        
        # Check parent relationship
        if span.parent_span_id:
            print(f"👆 Parent span: {span.parent_span_id}")
            
        # Monitor execution state
        if span.agent_name:
            print(f"🤖 Agent: {span.agent_name}")
    
    result = perform_search(query)
    
    # Check span after execution
    if span:
        print(f"📤 Output: {span.output}")
        print(f"⏱️  Duration: {span.duration}s")
        if span.has_evaluation:
            print(f"✅ Has {len(span.evaluation_runs)} evaluations")
        if span.error:
            print(f"❌ Error: {span.error}")
    
    return result
```

## Getting Started

```python
from judgeval import Tracer

# Initialize tracer
tracer = Tracer(
    api_key="your_api_key",
    project_name="my_agent_project"
)

# Basic function tracing
@tracer.observe(span_type="agent")
def my_agent(query):
    tracer.set_metadata(user_query=query)
    result = process_query(query)
    tracer.log("Processing completed", label="info")
    return result

# Auto-trace LLM calls
from openai import OpenAI
from judgeval import wrap

client = wrap(OpenAI())
response = client.chat.completions.create(...)  # Automatically traced
```
Cursor Rules

Cursor Rules File

On this page