NovaPilot
NovaPilot analyzes your eval job results and automatically generates recommendations, reports, and scheduled quality monitoring for your AI agents.
What is NovaPilot?
NovaPilot is the AI analyst layer on top of your evaluation results. After NovaEval scores your dataset, NovaPilot examines those scores, identifies failure patterns, and tells you exactly what to fix — with specific prompt suggestions, priority rankings, and impact estimates.
NovaPilot is fully dashboard-driven. There's nothing to configure or install — it reads from your eval jobs and acts automatically.
The NovaPilot dashboard
When you navigate to Project → NovaPilot, you land directly on the Recommendations page. The sidebar gives you access to:
Recommendations
The Recommendations page shows AI-generated, prioritized fixes based on your most recent eval run results.
Each recommendation includes:
| Element | Description |
|---|---|
| Issue summary | What the problem is in plain language (e.g., "Agent is hallucinating product names") |
| Affected scorers | Which scorers are failing and by how much |
| Affected items | How many dataset items exhibit this issue |
| Priority | High / Medium / Low — based on failure severity and frequency |
| Suggested fix | Specific prompt or instruction change to address the issue |
| System prompt suggestion | A revised system prompt block you can copy and test |
Recommendations are regenerated automatically after each new eval run.
Reports
Reports are auto-generated documents that give you a full picture of your agent's evaluation health. A new report is created after every NovaPilot run — either triggered manually from the NovaPilot dashboard or automatically by a Cron Job.
Report sections
Overall health card
A summary of the eval run:
- Total items evaluated
- Overall pass rate (%)
- Change vs. previous run (↑ / ↓ / →)
- Timestamp and dataset version
Scorer performance accordion
An expandable section for each scorer in the eval job:
- Current mean score
- Pass/fail breakdown
- Trend chart (last N runs)
- Worst-performing items (sample)
Failure patterns card
NovaPilot groups failing items by common characteristics:
- "17 items failed
faithfulness— all had retrieval_query about pricing" - "11 items failed
instruction_adherence— all calls longer than 8 minutes"
System prompt suggestions card
Specific prompt blocks that NovaPilot recommends adding or modifying, ranked by estimated impact. Each suggestion shows:
- The original prompt section (if applicable)
- The proposed replacement
- The scorer(s) it is expected to improve
Sharing and exporting
From any report, you can:
- Download PDF — formatted report for offline sharing and archiving
- Download Markdown — portable plain-text report (
.mdfile) - Email report — send to teammates via the email preferences menu
Cron Jobs
Cron Jobs let you schedule recurring eval runs that automatically generate a new NovaPilot report on a cadence.
Creating a cron job
0 9 * * 1 for every Monday at 9am) or use a preset (daily, weekly, monthly).Cron job detail
The cron job detail page shows:
- Next scheduled run time
- Run history — list of all past runs with status (success / failed) and links to the generated reports
- Settings — edit schedule, dataset, scorers, or notification preferences
- Run now button — trigger an immediate run outside the schedule
Cron job runs
Each cron job run has its own detail page showing:
- Start/end time and duration
- Items evaluated and pass rate
- Link to the generated NovaPilot report
- Error details if the run failed
Chat
The Chat interface lets you ask NovaPilot questions about your agent's performance in natural language.
Example questions:
- "What's been causing the most failures this week?"
- "Which scorer has the worst trend?"
- "Show me items where the agent hallucinated"
- "What would happen if I improved my system prompt?"
- "Compare this week's results to last week"
The chat has full context on your eval results, dataset items, and historical reports.
How NovaPilot analyzes results
NovaPilot uses four specialized AI agents internally:
| Agent | Focus |
|---|---|
| Flow Analyzer | Agent execution flow, decision patterns, state transitions |
| Prompt Analyzer | Prompt quality, ambiguities, suggested improvements |
| Tool Analyzer | Tool selection, parameter quality, result handling |
| General Analyzer | Cross-cutting patterns, statistical anomalies, summary |
These agents collaborate on each eval run to produce the recommendations, failure patterns, and system prompt suggestions shown in the dashboard.
Full workflow
Next steps
- Running Evaluations — set up the eval job that NovaPilot reads from
- Scorers Reference — understand what the scorers measure
- Datasets — manage the dataset your evals run against
- NovaSynth — generate synthetic test data for your agent
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.