NovaPilot - Intelligent Analysis Orchestrator

Automated AI agent and conversational dataset analysis with specialized AI agents and comprehensive reporting

NovaPilot is Noveum's intelligent orchestrator for automated analysis of AI agent and conversational datasets. It uses specialized AI agents to identify issues, validate reasoning, and generate actionable insights from evaluation scores.

What is NovaPilot?

NovaPilot is an advanced analysis engine that automatically examines evaluation results from NovaEval to identify patterns, issues, and optimization opportunities. It acts as an AI analyst that understands your agent's behavior and provides detailed recommendations for improvement.

🤖 Specialized AI Agents

Four specialized agents analyze different aspects: flow logic, prompts, tools, and general patterns

🎯 Automatic Detection

Automatically detects dataset type (agent vs conversational) and applies appropriate analysis strategies

📊 Streaming Statistics

Memory-efficient analysis of large datasets using streaming algorithms

✅ Reasoning Validation

Validates and extracts meaningful insights from evaluation reasoning

Key Features

Specialized Analysis Agents

NovaPilot employs four specialized AI agents, each focusing on different aspects of your AI system:

1. Flow Analyzer Agent

Analyzes agent execution flow and decision-making patterns
Identifies issues with state transitions and control flow
Detects loops, dead ends, and inefficient paths

2. Prompt Analyzer Agent

Examines prompt quality and effectiveness
Identifies ambiguous or problematic prompts
Suggests prompt improvements for better performance

3. Tool Analyzer Agent

Evaluates tool usage patterns and effectiveness
Identifies tool selection issues and misuse patterns
Recommends tool configuration improvements

4. General Analyzer Agent

Performs comprehensive cross-cutting analysis
Identifies systemic issues and patterns
Provides holistic recommendations

Automatic Dataset Type Detection

from agents.novapilot import NovaPilot
 
# Automatically detects whether your dataset is agent-based or conversational
pilot = NovaPilot(
    threshold=6.0,
    dataset_type="auto",  # Automatic detection
    enable_pre_analysis=True
)
 
# Analyze your evaluation results
report = pilot.analyze("path/to/evaluation_results.json")

Batch Processing with Parallel Execution

NovaPilot efficiently processes large datasets using intelligent batching and parallel execution:

pilot = NovaPilot(
    batch_size=50,              # Analysis batch size
    validation_batch_size=20,   # Validation batch size
    max_concurrent_batches=4    # Parallel processing
)
 
report = pilot.analyze(
    dataset_path="large_evaluation_results.json",
    output_dir="analysis_reports"
)

Comprehensive Reporting

NovaPilot generates detailed JSON reports with:

Pre-analysis Statistics: Score distributions, pass/fail rates, statistical summaries per scorer
Bad Score Identification: Automatically filters and prioritizes low-scoring items
Agent Analysis: Detailed insights from each specialized agent
Reasoning Validation: Extracted and validated reasoning from evaluation scores
Actionable Recommendations: Concrete steps to improve your AI system

Analysis Workflow

NovaPilot follows a systematic four-stage analysis process:

1. Load and Pre-analyze

# Load dataset and compute streaming statistics
pilot = NovaPilot(enable_pre_analysis=True)
stats = pilot._load_and_preanalyze(dataset_path)
 
# Access pre-analysis insights
print(f"Total items: {stats['total_items']}")
print(f"Bad score items: {stats['bad_score_count']}")

2. Filter and Validate

# Filter bad scores and validate reasoning
bad_scores = pilot._filter_and_validate_bad_scores(
    dataset=data,
    threshold=6.0
)
 
# Get validated scores with extracted reasoning
for score in bad_scores:
    print(f"Score: {score['score']}, Reasoning: {score['reasoning']}")

3. Analyze with Agents

# Run specialized agent analysis
analyses = pilot._analyze_with_agents(bad_scores)
 
# Access agent-specific insights
for agent_name, analysis in analyses.items():
    print(f"{agent_name}: {analysis['summary']}")

4. Generate Report

# Generate comprehensive JSON report
report = pilot._generate_report(
    pre_analysis=stats,
    agent_analyses=analyses,
    dataset_name="my_agent_eval"
)
 
# Report saved to output_reports/my_agent_eval/final_report.json

Configuration Options

Score Thresholds

from agents.novapilot.utils import ScoreThreshold
 
pilot = NovaPilot(
    threshold=ScoreThreshold.DEFAULT_BAD_SCORE,  # 6.0
    # or use custom threshold
    threshold=7.5
)

Batch Sizes

from agents.novapilot.utils import BatchSize
 
pilot = NovaPilot(
    batch_size=BatchSize.ANALYSIS,          # 50 for analysis
    validation_batch_size=BatchSize.VALIDATION  # 20 for validation
)

Custom Model Configuration

# Use custom LLM for analysis
from agents.novapilot.model_factory import ModelFactory
 
model = ModelFactory.create_model(
    provider="openai",
    model_name="gpt-4"
)
 
pilot = NovaPilot(model=model)

Dataset Format Support

Agent Datasets

NovaPilot automatically detects agent datasets with this structure:

{
  "trace_id": "trace_123",
  "scorer_results": [
    {
      "scorer_id": "tool_correctness",
      "score": 5.0,
      "passed": 0,
      "reasoning": "Tool selection was incorrect...",
      "metadata": {...}
    }
  ]
}

Conversational Datasets

For conversational datasets:

{
  "conversation_id": "conv_123",
  "scorer_results": [
    {
      "scorer_id": "conversation_relevancy",
      "score": 7.5,
      "passed": 1,
      "reasoning": "Response addresses user query...",
      "metadata": {...}
    }
  ]
}

Error Handling

NovaPilot provides robust error handling with custom exceptions:

from agents.novapilot.exceptions import (
    NovaPilotError,
    DatasetLoadError,
    ConfigurationError
)
 
try:
    pilot = NovaPilot(threshold=15.0)  # Invalid threshold
except ConfigurationError as e:
    print(f"Configuration error: {e}")
 
try:
    report = pilot.analyze("invalid_path.json")
except DatasetLoadError as e:
    print(f"Dataset load error: {e}")
except NovaPilotError as e:
    print(f"Analysis error: {e}")

Example Reports

NovaPilot generates comprehensive JSON reports saved to your output directory:

output_reports/
├── my_agent_eval/
│   ├── 2025-01-12_10-30-45/
│   │   ├── detailed_report.json
│   │   └── final_report.json

Report Structure

{
  "dataset_name": "my_agent_eval",
  "analysis_timestamp": "2025-01-12T10:30:45",
  "pre_analysis": {
    "total_items": 1000,
    "bad_score_count": 127,
    "scorer_statistics": {...}
  },
  "agent_analyses": {
    "flow_analyzer": {
      "summary": "Identified 15 flow issues...",
      "recommendations": [...]
    },
    "tool_analyzer": {
      "summary": "Found 8 tool misuse patterns...",
      "recommendations": [...]
    }
  },
  "summary": {
    "key_findings": [...],
    "critical_issues": [...],
    "next_steps": [...]
  }
}

Integration with NovaEval

NovaPilot works seamlessly with NovaEval evaluation results:

from agents.novapilot import NovaPilot
 
# 1. Run evaluations with NovaEval
# (Your NovaEval evaluation code here)
 
# 2. Analyze results with NovaPilot
pilot = NovaPilot(
    threshold=6.0,
    enable_pre_analysis=True
)
 
report = pilot.analyze(
    dataset_path="novaeval_results.json",
    output_dir="pilot_reports"
)
 
# 3. Review generated insights
print(f"Critical issues found: {len(report['summary']['critical_issues'])}")

Best Practices

1. Start with Pre-analysis

Always enable pre-analysis to understand your dataset before deep analysis:

pilot = NovaPilot(enable_pre_analysis=True)

2. Use Appropriate Thresholds

Adjust thresholds based on your quality requirements:

# Stricter threshold for production systems
pilot = NovaPilot(threshold=7.5)
 
# More lenient for development
pilot = NovaPilot(threshold=5.0)

3. Optimize Batch Sizes

Tune batch sizes based on your dataset size and available memory:

# Large datasets
pilot = NovaPilot(
    batch_size=100,
    max_concurrent_batches=8
)
 
# Smaller datasets or limited memory
pilot = NovaPilot(
    batch_size=25,
    max_concurrent_batches=2
)

4. Organize Reports

Use descriptive output directories:

report = pilot.analyze(
    dataset_path="eval_results.json",
    output_dir=f"reports/{project_name}/{date}"
)

Support

Integration: Works with NovaEval evaluation results
Tracing: Requires data from noveum-trace
Platform: https://noveum.ai/
Email: support@noveum.ai

Next Steps

NovaEval Overview - Learn about evaluation scorers
Getting Started - Set up your first project
Integration Examples - See complete workflows
Dashboard Guide - Visualize your results

Ready to get automated insights from your AI evaluations? Integrate NovaPilot with your NovaEval workflow!