Tracing Concepts for AI Applications

Understanding traces, spans, and observability fundamentals for LLM applications, RAG systems, and AI agents

Understanding the fundamentals of tracing is essential for getting the most out of Noveum.ai. This guide explains key concepts specifically in the context of AI applications, helping you design effective observability strategies for your LLM applications, RAG systems, and AI agents.

🎯 What is Tracing?

Tracing is the practice of tracking requests as they flow through your system, creating a detailed map of what happened, when, and how long each operation took. For AI applications, tracing provides crucial insights into:

🔍 Request Flow: How user queries move through your AI pipeline
⏱️ Performance: Where time is spent in your AI operations
💰 Costs: Which operations drive your AI spending
🐛 Debugging: What went wrong when errors occur
📊 Quality: How well your AI system is performing

🌟 Core Concepts

1. Traces

A trace represents a single journey through your system—like a user asking a question and getting an answer. Think of it as the complete story of one request.

# Example: A complete RAG query trace
@noveum_trace.trace("rag-query")
def answer_question(question: str) -> str:
    # This creates a trace that contains all the operations below
    embeddings = generate_embeddings(question)      # Span 1
    documents = retrieve_documents(embeddings)      # Span 2
    answer = generate_answer(question, documents)   # Span 3
    return answer

Trace Characteristics:

🆔 Unique ID: Every trace has a unique identifier
⏰ Timeline: Start and end timestamps
🌐 Distributed: Can span multiple services
📊 Hierarchical: Contains multiple related spans

2. Spans

A span represents a single operation within a trace. Each span has a clear start and end time and represents work being done.

// Example: Individual spans within a trace
const result = await trace('llm-completion', async () => {
 
  // Span 1: Prompt preparation
  const prompt = await span('prepare-prompt', async () => {
    return buildPromptFromTemplate(userInput, context);
  });
 
  // Span 2: LLM API call
  const response = await span('openai-call', async () => {
    return await openai.chat.completions.create({
      model: 'gpt-4',
      messages: [{ role: 'user', content: prompt }]
    });
  });
 
  // Span 3: Response processing
  return await span('process-response', async () => {
    return parseAndValidateResponse(response);
  });
 
});

Span Characteristics:

📛 Name: Descriptive name of the operation
⏱️ Duration: How long the operation took
👥 Parent-Child: Spans can contain other spans
🏷️ Attributes: Key-value metadata about the operation
📝 Events: Point-in-time occurrences during the span

3. Attributes

Attributes are key-value pairs that provide context about what happened during a span. They're crucial for understanding and filtering your traces.

@noveum_trace.trace("llm-call")
def call_llm(model: str, prompt: str, user_id: str):
    # Add attributes for context
    noveum_trace.set_attribute("llm.model", model)
    noveum_trace.set_attribute("llm.provider", "openai")
    noveum_trace.set_attribute("user.id", user_id)
    noveum_trace.set_attribute("prompt.length", len(prompt))
    noveum_trace.set_attribute("prompt.type", "user_query")
 
    response = openai.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )
 
    # Add response attributes
    noveum_trace.set_attribute("llm.tokens.input", response.usage.prompt_tokens)
    noveum_trace.set_attribute("llm.tokens.output", response.usage.completion_tokens)
    noveum_trace.set_attribute("llm.cost.estimated", calculate_cost(response.usage))
 
    return response.choices[0].message.content

Common AI Attribute Categories:

🤖 LLM Attributes: llm.model, llm.provider, llm.temperature
💰 Cost Attributes: llm.tokens.input, llm.tokens.output, llm.cost
👤 User Attributes: user.id, user.plan, user.location
📄 Content Attributes: prompt.length, response.length, content.type
🔍 Quality Attributes: relevance.score, confidence.level, accuracy.rating

4. Events

Events represent things that happened at a specific point in time during a span. They're perfect for capturing important moments or milestones.

@noveum_trace.trace("document-processing")
def process_document(doc_id: str):
    noveum_trace.add_event("processing.started", {
        "document.id": doc_id,
        "timestamp": datetime.now().isoformat()
    })
 
    try:
        # Processing logic
        chunks = split_document(doc_id)
        noveum_trace.add_event("document.chunked", {
            "chunks.count": len(chunks),
            "chunks.avg_size": sum(len(c) for c in chunks) / len(chunks)
        })
 
        embeddings = generate_embeddings(chunks)
        noveum_trace.add_event("embeddings.generated", {
            "embeddings.count": len(embeddings),
            "embeddings.model": "text-embedding-ada-002"
        })
 
        return embeddings
 
    except Exception as e:
        noveum_trace.add_event("processing.failed", {
            "error.type": type(e).__name__,
            "error.message": str(e)
        })
        raise

🧠 AI-Specific Tracing Patterns

RAG Pipeline Tracing

RAG (Retrieval-Augmented Generation) systems have distinct phases that should be traced separately:

@noveum_trace.trace("rag-pipeline")
def rag_query(question: str) -> str:
    # Phase 1: Query understanding
    with noveum_trace.trace_step("query-analysis") as step:
        intent = analyze_query_intent(question)
        step.set_attribute("query.intent", intent)
        step.set_attribute("query.complexity", get_complexity_score(question))
 
    # Phase 2: Retrieval
    with noveum_trace.trace_step("document-retrieval") as step:
        embeddings = generate_embeddings(question)
        documents = vector_search(embeddings, k=5)
 
        step.set_attribute("retrieval.query_embedding_time", embedding_time)
        step.set_attribute("retrieval.search_time", search_time)
        step.set_attribute("retrieval.documents_found", len(documents))
        step.set_attribute("retrieval.avg_similarity", avg_similarity(documents))
 
    # Phase 3: Generation
    with noveum_trace.trace_step("answer-generation") as step:
        context = build_context(documents)
        answer = generate_answer_with_context(question, context)
 
        step.set_attribute("generation.context_length", len(context))
        step.set_attribute("generation.answer_length", len(answer))
        step.set_attribute("generation.model", "gpt-4")
 
    return answer

Multi-Agent Tracing

When dealing with multiple AI agents, trace their interactions and coordination:

const multiAgentTask = trace('multi-agent-task', async (task: string) => {
 
  // Agent coordination
  const plan = await span('task-planning', async (spanInstance) => {
    spanInstance.setAttribute('task.type', classifyTask(task));
    spanInstance.setAttribute('agents.required', ['researcher', 'writer', 'reviewer']);
    return await planningAgent.createPlan(task);
  });
 
  // Individual agent execution
  const results = [];
  for (const step of plan.steps) {
    const result = await span(`agent-${step.agent}`, async (spanInstance) => {
      spanInstance.setAttribute('agent.name', step.agent);
      spanInstance.setAttribute('agent.task', step.task);
      spanInstance.setAttribute('agent.tools', step.tools);
 
      const agentResult = await executeAgentStep(step);
 
      spanInstance.setAttribute('agent.success', agentResult.success);
      spanInstance.setAttribute('agent.confidence', agentResult.confidence);
 
      return agentResult;
    });
 
    results.push(result);
  }
 
  // Final synthesis
  return await span('result-synthesis', async () => {
    return synthesizeResults(results);
  });
});

📊 Observability Best Practices

1. Meaningful Span Names

Use descriptive, consistent naming conventions:

# ✅ Good span names
"llm-completion"
"document-retrieval"
"user-authentication"
"payment-processing"
 
# ❌ Poor span names
"function1"
"process"
"api_call"
"step"

2. Rich Attributes

Include context that helps with debugging and analysis:

# ✅ Rich attributes
noveum_trace.set_attribute("user.id", user_id)
noveum_trace.set_attribute("user.plan", "premium")
noveum_trace.set_attribute("llm.model", "gpt-4")
noveum_trace.set_attribute("llm.temperature", 0.7)
noveum_trace.set_attribute("prompt.category", "technical_question")
noveum_trace.set_attribute("response.confidence", 0.92)
 
# ❌ Minimal attributes
noveum_trace.set_attribute("status", "ok")

3. Error Handling

Always capture error details:

try:
    result = expensive_ai_operation()
    noveum_trace.set_attribute("operation.success", True)
    noveum_trace.set_attribute("operation.result_quality", assess_quality(result))
except Exception as e:
    noveum_trace.set_attribute("operation.success", False)
    noveum_trace.set_attribute("error.type", type(e).__name__)
    noveum_trace.set_attribute("error.message", str(e))
    noveum_trace.add_event("operation.failed", {
        "error.timestamp": datetime.now().isoformat(),
        "error.recoverable": is_recoverable_error(e)
    })
    raise

4. Performance Context

Include performance-relevant attributes:

const performanceSpan = span('expensive-operation', async (spanInstance) => {
  const startMemory = process.memoryUsage();
  const startTime = performance.now();
 
  try {
    const result = await expensiveOperation();
 
    const endTime = performance.now();
    const endMemory = process.memoryUsage();
 
    spanInstance.setAttribute('performance.duration_ms', endTime - startTime);
    spanInstance.setAttribute('performance.memory_delta_mb',
      (endMemory.heapUsed - startMemory.heapUsed) / 1024 / 1024);
    spanInstance.setAttribute('performance.cpu_intensive', true);
 
    return result;
  } catch (error) {
    spanInstance.setAttribute('performance.failed', true);
    throw error;
  }
});

🔍 Using Traces for Debugging

Common Debugging Scenarios

1. Slow Response Times

Look for spans with high duration:
- Is the LLM call taking too long?
- Is document retrieval the bottleneck?
- Are there unnecessary sequential operations?

2. High Costs

Analyze cost-related attributes:
- Which models are being used?
- How many tokens are being consumed?
- Are there redundant API calls?

3. Quality Issues

Examine quality attributes:
- What's the confidence score of responses?
- How relevant are retrieved documents?
- Are there patterns in failed operations?

4. Error Patterns

Filter by error events and attributes:
- What types of errors are most common?
- Do errors correlate with specific users/inputs?
- Are errors happening at specific times?

🎯 Next Steps

Now that you understand tracing concepts, you're ready to:

Implement SDK Integration - Add tracing to your application
Explore Framework Integrations - Framework-specific guidance
Learn Advanced Patterns - Custom instrumentation techniques
Master the Dashboard - Analyze your traces effectively

Remember: Good observability is not about collecting all possible data, but about collecting the right data that helps you understand, debug, and optimize your AI applications.