Documentation
NovaPilot/NovaPilot

NovaPilot

NovaPilot analyzes your eval job results and automatically generates recommendations, reports, and scheduled quality monitoring for your AI agents.

What is NovaPilot?

NovaPilot is the AI analyst layer on top of your evaluation results. After NovaEval scores your dataset, NovaPilot examines those scores, identifies failure patterns, and tells you exactly what to fix — with specific prompt suggestions, priority rankings, and impact estimates.

NovaPilot is fully dashboard-driven. There's nothing to configure or install — it reads from your eval jobs and acts automatically.


The NovaPilot dashboard

When you navigate to Project → NovaPilot, you land directly on the Recommendations page. The sidebar gives you access to:

NovaPilot
├── Recommendations   ← default landing page
├── Reports
├── Cron Jobs
└── Chat

Recommendations

The Recommendations page shows AI-generated, prioritized fixes based on your most recent eval run results.

Each recommendation includes:

ElementDescription
Issue summaryWhat the problem is in plain language (e.g., "Agent is hallucinating product names")
Affected scorersWhich scorers are failing and by how much
Affected itemsHow many dataset items exhibit this issue
PriorityHigh / Medium / Low — based on failure severity and frequency
Suggested fixSpecific prompt or instruction change to address the issue
System prompt suggestionA revised system prompt block you can copy and test

Recommendations are regenerated automatically after each new eval run.


Reports

Reports are auto-generated documents that give you a full picture of your agent's evaluation health. A new report is created after every NovaPilot run — either triggered manually from the NovaPilot dashboard or automatically by a Cron Job.

Report sections

Overall health card

A summary of the eval run:

  • Total items evaluated
  • Overall pass rate (%)
  • Change vs. previous run (↑ / ↓ / →)
  • Timestamp and dataset version

Scorer performance accordion

An expandable section for each scorer in the eval job:

  • Current mean score
  • Pass/fail breakdown
  • Trend chart (last N runs)
  • Worst-performing items (sample)

Failure patterns card

NovaPilot groups failing items by common characteristics:

  • "17 items failed faithfulness — all had retrieval_query about pricing"
  • "11 items failed instruction_adherence — all calls longer than 8 minutes"

System prompt suggestions card

Specific prompt blocks that NovaPilot recommends adding or modifying, ranked by estimated impact. Each suggestion shows:

  • The original prompt section (if applicable)
  • The proposed replacement
  • The scorer(s) it is expected to improve

Sharing and exporting

From any report, you can:

  • Download PDF — formatted report for offline sharing and archiving
  • Download Markdown — portable plain-text report (.md file)
  • Email report — send to teammates via the email preferences menu

Cron Jobs

Cron Jobs let you schedule recurring eval runs that automatically generate a new NovaPilot report on a cadence.

Creating a cron job

Navigate to NovaPilot → Cron Jobs and click Create Cron Job.
Select the Eval Job to run on a schedule.
Set the schedule using a cron expression (e.g., 0 9 * * 1 for every Monday at 9am) or use a preset (daily, weekly, monthly).
Configure notifications — email on failure, email on report generation, or both.
Click Create.

Cron job detail

The cron job detail page shows:

  • Next scheduled run time
  • Run history — list of all past runs with status (success / failed) and links to the generated reports
  • Settings — edit schedule, dataset, scorers, or notification preferences
  • Run now button — trigger an immediate run outside the schedule

Cron job runs

Each cron job run has its own detail page showing:

  • Start/end time and duration
  • Items evaluated and pass rate
  • Link to the generated NovaPilot report
  • Error details if the run failed

Chat

The Chat interface lets you ask NovaPilot questions about your agent's performance in natural language.

Example questions:

  • "What's been causing the most failures this week?"
  • "Which scorer has the worst trend?"
  • "Show me items where the agent hallucinated"
  • "What would happen if I improved my system prompt?"
  • "Compare this week's results to last week"

The chat has full context on your eval results, dataset items, and historical reports.


How NovaPilot analyzes results

NovaPilot uses four specialized AI agents internally:

AgentFocus
Flow AnalyzerAgent execution flow, decision patterns, state transitions
Prompt AnalyzerPrompt quality, ambiguities, suggested improvements
Tool AnalyzerTool selection, parameter quality, result handling
General AnalyzerCross-cutting patterns, statistical anomalies, summary

These agents collaborate on each eval run to produce the recommendations, failure patterns, and system prompt suggestions shown in the dashboard.


Full workflow

1. SDK captures traces from your agent


2. ETL Job transforms traces → Dataset items


3. Eval Job scores items with your chosen scorers


4. NovaPilot analyzes scores → Report + Recommendations


5. Apply suggested prompt changes → re-run → compare

Next steps

Exclusive Early Access

Get Early Access to Noveum.ai Platform

Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.

Sign up now. We send access to new batch every week.

Early access members receive premium onboarding support and influence our product roadmap. Limited spots available.

NovaPilot | Documentation | Noveum.ai