What is a Dataset
Datasets are versioned collections of evaluation items that power NovaEval scoring and NovaPilot analysis. Learn how they are created, structured, and managed.
Overview
A Dataset in Noveum is a versioned collection of evaluation items β each one a structured snapshot of an AI interaction. Datasets are the bridge between your live application traces and the evaluation pipeline: you capture real (or synthetic) conversations and agent runs, transform them into a standardized format, and then run scorers against them to measure quality, safety, and performance.
Every item in a dataset conforms to the StandardData schema, which means scorers know exactly which fields to read regardless of the original trace format.
How Datasets Are Created
There are three ways to populate a dataset:
1. Manual selection from Traces
From the Datasets section of your project, you can select traces and convert them directly into dataset items. This is useful when you want to build a curated golden set for regression testing.
2. ETL Job (automated transformation)
An ETL Job continuously watches a trace environment and automatically transforms new spans into dataset items using an AI-generated Python mapper. This is the recommended approach for production monitoring and continuous evaluation.
See ETL Jobs for the full setup guide.
3. NovaSynth synthetic runs
When you run NovaSynth tests, each synthetic session automatically generates dataset items β complete with audio metrics, STT/TTS data, and conversation turns. These are ideal for pre-production quality gates.
See NovaSynth for details.
Dataset Types
Every dataset item has a novaeval_item_type field that tells scorers how to evaluate it:
| Type | Description | Typical use case |
|---|---|---|
agent | Single-turn or multi-step agentic interaction with tool calls, RAG retrieval, and exit status | LLM agents, function-calling pipelines, RAG systems |
conversational | Multi-turn dialogue with speaker-tagged messages | Chatbots, voice assistants, customer support bots |
Items can be mixed within a dataset, and scorers will automatically apply the correct evaluation logic based on the type.
Dataset Versions
Datasets are versioned so you can evolve your evaluation set without losing history.
Draft and Published states
Every dataset starts in a draft state. While in draft, items can be freely added, edited, or removed. Once you're satisfied, you publish the dataset β creating an immutable snapshot that eval jobs can run against.
The dashboard shows an "Unreleased changes" banner whenever a dataset has unpublished modifications since the last publish.
Version diff
When reviewing a new draft, the version diff view shows exactly which items were added, modified, or removed since the last published version. This makes it easy to audit changes before committing them.
The Datasets UI
The Datasets interface is a three-pane layout:
Item detail tabs
When you select a dataset item, the right pane shows structured tabs:
| Tab | What it shows |
|---|---|
| agent-info | Agent name, role, task description, exit status |
| conversation | Multi-turn dialogue with speaker labels |
| execution | Tool calls made, parameters passed, tool results |
| system-prompt | The system prompt used for this interaction |
| tools | Available tools and their schemas |
| retrieval | RAG queries issued and retrieved context chunks |
| response-analysis | Agent response, ground truth, extracted content |
| evaluation-context | Custom context fields for scorer input |
| audio-metrics | STT/TTS latency, MOS score, audio quality signals |
| score-details | Per-scorer results with pass/fail and raw scores |
Dependency checking
Before you can delete a dataset, Noveum checks whether any Eval Jobs or NovaPilot Cron Jobs depend on it. If dependencies exist, a warning dialog lists them so you can update those jobs before proceeding.
Next steps
- StandardData Schema β full field reference
- ETL Jobs β automate trace β dataset transformation
- Running Evaluations β score your dataset with NovaEval
- NovaPilot β AI-powered analysis and recommendations
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.