Documentation

NovaPilot - Intelligent Analysis Orchestrator

Automated AI agent and conversational dataset analysis with specialized AI agents and comprehensive reporting

NovaPilot is Noveum's intelligent orchestrator for automated analysis of AI agent and conversational datasets. It uses specialized AI agents to identify issues, validate reasoning, and generate actionable insights from evaluation scores.

What is NovaPilot?

NovaPilot is an advanced analysis engine that automatically examines evaluation results from NovaEval to identify patterns, issues, and optimization opportunities. It acts as an AI analyst that understands your agent's behavior and provides detailed recommendations for improvement.

🤖 Specialized AI Agents

Four specialized agents analyze different aspects: flow logic, prompts, tools, and general patterns

🎯 Automatic Detection

Automatically detects dataset type (agent vs conversational) and applies appropriate analysis strategies

📊 Streaming Statistics

Memory-efficient analysis of large datasets using streaming algorithms

✅ Reasoning Validation

Validates and extracts meaningful insights from evaluation reasoning

Key Features

Specialized Analysis Agents

NovaPilot employs four specialized AI agents, each focusing on different aspects of your AI system:

1. Flow Analyzer Agent

  • Analyzes agent execution flow and decision-making patterns
  • Identifies issues with state transitions and control flow
  • Detects loops, dead ends, and inefficient paths

2. Prompt Analyzer Agent

  • Examines prompt quality and effectiveness
  • Identifies ambiguous or problematic prompts
  • Suggests prompt improvements for better performance

3. Tool Analyzer Agent

  • Evaluates tool usage patterns and effectiveness
  • Identifies tool selection issues and misuse patterns
  • Recommends tool configuration improvements

4. General Analyzer Agent

  • Performs comprehensive cross-cutting analysis
  • Identifies systemic issues and patterns
  • Provides holistic recommendations

Automatic Dataset Type Detection

from agents.novapilot import NovaPilot
 
# Automatically detects whether your dataset is agent-based or conversational
pilot = NovaPilot(
    threshold=6.0,
    dataset_type="auto",  # Automatic detection
    enable_pre_analysis=True
)
 
# Analyze your evaluation results
report = pilot.analyze("path/to/evaluation_results.json")

Batch Processing with Parallel Execution

NovaPilot efficiently processes large datasets using intelligent batching and parallel execution:

pilot = NovaPilot(
    batch_size=50,              # Analysis batch size
    validation_batch_size=20,   # Validation batch size
    max_concurrent_batches=4    # Parallel processing
)
 
report = pilot.analyze(
    dataset_path="large_evaluation_results.json",
    output_dir="analysis_reports"
)

Comprehensive Reporting

NovaPilot generates detailed JSON reports with:

  • Pre-analysis Statistics: Score distributions, pass/fail rates, statistical summaries per scorer
  • Bad Score Identification: Automatically filters and prioritizes low-scoring items
  • Agent Analysis: Detailed insights from each specialized agent
  • Reasoning Validation: Extracted and validated reasoning from evaluation scores
  • Actionable Recommendations: Concrete steps to improve your AI system

Analysis Workflow

NovaPilot follows a systematic four-stage analysis process:

1. Load and Pre-analyze

# Load dataset and compute streaming statistics
pilot = NovaPilot(enable_pre_analysis=True)
stats = pilot._load_and_preanalyze(dataset_path)
 
# Access pre-analysis insights
print(f"Total items: {stats['total_items']}")
print(f"Bad score items: {stats['bad_score_count']}")

2. Filter and Validate

# Filter bad scores and validate reasoning
bad_scores = pilot._filter_and_validate_bad_scores(
    dataset=data,
    threshold=6.0
)
 
# Get validated scores with extracted reasoning
for score in bad_scores:
    print(f"Score: {score['score']}, Reasoning: {score['reasoning']}")

3. Analyze with Agents

# Run specialized agent analysis
analyses = pilot._analyze_with_agents(bad_scores)
 
# Access agent-specific insights
for agent_name, analysis in analyses.items():
    print(f"{agent_name}: {analysis['summary']}")

4. Generate Report

# Generate comprehensive JSON report
report = pilot._generate_report(
    pre_analysis=stats,
    agent_analyses=analyses,
    dataset_name="my_agent_eval"
)
 
# Report saved to output_reports/my_agent_eval/final_report.json

Configuration Options

Score Thresholds

from agents.novapilot.utils import ScoreThreshold
 
pilot = NovaPilot(
    threshold=ScoreThreshold.DEFAULT_BAD_SCORE,  # 6.0
    # or use custom threshold
    threshold=7.5
)

Batch Sizes

from agents.novapilot.utils import BatchSize
 
pilot = NovaPilot(
    batch_size=BatchSize.ANALYSIS,          # 50 for analysis
    validation_batch_size=BatchSize.VALIDATION  # 20 for validation
)

Custom Model Configuration

# Use custom LLM for analysis
from agents.novapilot.model_factory import ModelFactory
 
model = ModelFactory.create_model(
    provider="openai",
    model_name="gpt-4"
)
 
pilot = NovaPilot(model=model)

Dataset Format Support

Agent Datasets

NovaPilot automatically detects agent datasets with this structure:

{
  "trace_id": "trace_123",
  "scorer_results": [
    {
      "scorer_id": "tool_correctness",
      "score": 5.0,
      "passed": 0,
      "reasoning": "Tool selection was incorrect...",
      "metadata": {...}
    }
  ]
}

Conversational Datasets

For conversational datasets:

{
  "conversation_id": "conv_123",
  "scorer_results": [
    {
      "scorer_id": "conversation_relevancy",
      "score": 7.5,
      "passed": 1,
      "reasoning": "Response addresses user query...",
      "metadata": {...}
    }
  ]
}

Error Handling

NovaPilot provides robust error handling with custom exceptions:

from agents.novapilot.exceptions import (
    NovaPilotError,
    DatasetLoadError,
    ConfigurationError
)
 
try:
    pilot = NovaPilot(threshold=15.0)  # Invalid threshold
except ConfigurationError as e:
    print(f"Configuration error: {e}")
 
try:
    report = pilot.analyze("invalid_path.json")
except DatasetLoadError as e:
    print(f"Dataset load error: {e}")
except NovaPilotError as e:
    print(f"Analysis error: {e}")

Example Reports

NovaPilot generates comprehensive JSON reports saved to your output directory:

output_reports/
├── my_agent_eval/
│   ├── 2025-01-12_10-30-45/
│   │   ├── detailed_report.json
│   │   └── final_report.json

Report Structure

{
  "dataset_name": "my_agent_eval",
  "analysis_timestamp": "2025-01-12T10:30:45",
  "pre_analysis": {
    "total_items": 1000,
    "bad_score_count": 127,
    "scorer_statistics": {...}
  },
  "agent_analyses": {
    "flow_analyzer": {
      "summary": "Identified 15 flow issues...",
      "recommendations": [...]
    },
    "tool_analyzer": {
      "summary": "Found 8 tool misuse patterns...",
      "recommendations": [...]
    }
  },
  "summary": {
    "key_findings": [...],
    "critical_issues": [...],
    "next_steps": [...]
  }
}

Integration with NovaEval

NovaPilot works seamlessly with NovaEval evaluation results:

from agents.novapilot import NovaPilot
 
# 1. Run evaluations with NovaEval
# (Your NovaEval evaluation code here)
 
# 2. Analyze results with NovaPilot
pilot = NovaPilot(
    threshold=6.0,
    enable_pre_analysis=True
)
 
report = pilot.analyze(
    dataset_path="novaeval_results.json",
    output_dir="pilot_reports"
)
 
# 3. Review generated insights
print(f"Critical issues found: {len(report['summary']['critical_issues'])}")

Best Practices

1. Start with Pre-analysis

Always enable pre-analysis to understand your dataset before deep analysis:

pilot = NovaPilot(enable_pre_analysis=True)

2. Use Appropriate Thresholds

Adjust thresholds based on your quality requirements:

# Stricter threshold for production systems
pilot = NovaPilot(threshold=7.5)
 
# More lenient for development
pilot = NovaPilot(threshold=5.0)

3. Optimize Batch Sizes

Tune batch sizes based on your dataset size and available memory:

# Large datasets
pilot = NovaPilot(
    batch_size=100,
    max_concurrent_batches=8
)
 
# Smaller datasets or limited memory
pilot = NovaPilot(
    batch_size=25,
    max_concurrent_batches=2
)

4. Organize Reports

Use descriptive output directories:

report = pilot.analyze(
    dataset_path="eval_results.json",
    output_dir=f"reports/{project_name}/{date}"
)

Support

Next Steps


Ready to get automated insights from your AI evaluations? Integrate NovaPilot with your NovaEval workflow!

Exclusive Early Access

Get Early Access to Noveum.ai Platform

Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.

Sign up now. We send access to new batch every week.

Early access members receive premium onboarding support and influence our product roadmap. Limited spots available.