NovaPilot

NovaPilot analyzes your eval job results and automatically generates recommendations, reports, and scheduled quality monitoring for your AI agents.

What is NovaPilot?

NovaPilot is the AI analyst layer on top of your evaluation results. After NovaEval scores your dataset, NovaPilot examines those scores, identifies failure patterns, and tells you exactly what to fix — with specific prompt suggestions, priority rankings, and impact estimates.

NovaPilot is fully dashboard-driven. There's nothing to configure or install — it reads from your eval jobs and acts automatically.

The NovaPilot dashboard

When you navigate to Project → NovaPilot, you land directly on the Recommendations page. The sidebar gives you access to:

NovaPilot
├── Recommendations   ← default landing page
├── Reports
├── Cron Jobs
└── Chat

Recommendations

The Recommendations page shows AI-generated, prioritized fixes based on your most recent eval run results.

Each recommendation includes:

Element	Description
Issue summary	What the problem is in plain language (e.g., "Agent is hallucinating product names")
Affected scorers	Which scorers are failing and by how much
Affected items	How many dataset items exhibit this issue
Priority	High / Medium / Low — based on failure severity and frequency
Suggested fix	Specific prompt or instruction change to address the issue
System prompt suggestion	A revised system prompt block you can copy and test

Recommendations are regenerated automatically after each new eval run.

Reports

Reports are auto-generated documents that give you a full picture of your agent's evaluation health. A new report is created after every NovaPilot run — either triggered manually from the NovaPilot dashboard or automatically by a Cron Job.

Report sections

Overall health card

A summary of the eval run:

Total items evaluated
Overall pass rate (%)
Change vs. previous run (↑ / ↓ / →)
Timestamp and dataset version

Scorer performance accordion

An expandable section for each scorer in the eval job:

Current mean score
Pass/fail breakdown
Trend chart (last N runs)
Worst-performing items (sample)

Failure patterns card

NovaPilot groups failing items by common characteristics:

"17 items failed faithfulness — all had retrieval_query about pricing"
"11 items failed instruction_adherence — all calls longer than 8 minutes"

System prompt suggestions card

Specific prompt blocks that NovaPilot recommends adding or modifying, ranked by estimated impact. Each suggestion shows:

The original prompt section (if applicable)
The proposed replacement
The scorer(s) it is expected to improve

From any report, you can:

Download PDF — formatted report for offline sharing and archiving
Download Markdown — portable plain-text report (.md file)
Email report — send to teammates via the email preferences menu

Cron Jobs

Cron Jobs let you schedule recurring eval runs that automatically generate a new NovaPilot report on a cadence.

Creating a cron job

Navigate to NovaPilot → Cron Jobs and click Create Cron Job.

Select the Eval Job to run on a schedule.

Set the schedule using a cron expression (e.g., 0 9 * * 1 for every Monday at 9am) or use a preset (daily, weekly, monthly).

Configure notifications — email on failure, email on report generation, or both.

Click Create.

Cron job detail

The cron job detail page shows:

Next scheduled run time
Run history — list of all past runs with status (success / failed) and links to the generated reports
Settings — edit schedule, dataset, scorers, or notification preferences
Run now button — trigger an immediate run outside the schedule

Cron job runs

Each cron job run has its own detail page showing:

Start/end time and duration
Items evaluated and pass rate
Link to the generated NovaPilot report
Error details if the run failed

Chat

The Chat interface lets you ask NovaPilot questions about your agent's performance in natural language.

Example questions:

"What's been causing the most failures this week?"
"Which scorer has the worst trend?"
"Show me items where the agent hallucinated"
"What would happen if I improved my system prompt?"
"Compare this week's results to last week"

The chat has full context on your eval results, dataset items, and historical reports.

How NovaPilot analyzes results

NovaPilot uses four specialized AI agents internally:

Agent	Focus
Flow Analyzer	Agent execution flow, decision patterns, state transitions
Prompt Analyzer	Prompt quality, ambiguities, suggested improvements
Tool Analyzer	Tool selection, parameter quality, result handling
General Analyzer	Cross-cutting patterns, statistical anomalies, summary

These agents collaborate on each eval run to produce the recommendations, failure patterns, and system prompt suggestions shown in the dashboard.

Full workflow

1. SDK captures traces from your agent
         │
         ▼
2. ETL Job transforms traces → Dataset items
         │
         ▼
3. Eval Job scores items with your chosen scorers
         │
         ▼
4. NovaPilot analyzes scores → Report + Recommendations
         │
         ▼
5. Apply suggested prompt changes → re-run → compare

Next steps

Running Evaluations — set up the eval job that NovaPilot reads from
Scorers Reference — understand what the scorers measure
Datasets — manage the dataset your evals run against
NovaSynth — generate synthetic test data for your agent