Using the AI Mapper
How to generate, review, and improve the mapper code that converts your traces into evaluation items — all from the Noveum UI.
What the AI mapper does
When you create an ETL Job, Noveum includes a Mapper tab where you define how your traces get converted into dataset items.
Clicking Generate Code in the Mapper tab tells Noveum to read a sample of your recent traces and write mapper code for you automatically. The code appears in an editor where you can review it, tweak it, and then apply it to the job.
You never need to write this code from scratch — the AI handles it. Your job is to review the output, check that items look right, and use Improve Code if anything needs adjusting.
The key concept: one item per LLM call
Every LLM call in a trace becomes one dataset item.
If your agent handles a user conversation with 5 back-and-forth turns, that trace contains 5 LLM calls. The mapper extracts each one into its own item, so NovaEval can score every individual decision your agent made — not just the final answer.
Example:
| Trace | LLM calls | Dataset items created |
|---|---|---|
| A chatbot conversation with 5 turns | 5 | 5 |
| A RAG pipeline (retrieve + generate) | 2 | 2 |
| A single-turn question-answer agent | 1 | 1 |
| A multi-step research agent with 8 LLM calls | 8+ | 8+ |
This granularity is what makes evaluation useful. Instead of knowing "my agent is 70% accurate overall," you can see exactly which type of question it gets wrong, at which turn, with which system prompt.
What each item contains
After the mapper runs, open your Dataset to inspect the items. Each item contains the fields the scorers need to evaluate it:
| Field | What it represents |
|---|---|
| Input | The user's question or request for this turn |
| Output | What the LLM responded |
| System prompt | The instructions the LLM was operating under |
| Conversation history | All prior turns (important for conversational scorers) |
| Tool calls | Any tools the LLM invoked and the results they returned |
| Item type | conversational or agent — controls which scorers are available |
Generating mapper code (step by step)
Reviewing the output
After clicking Run, the preview panel shows:
- How many items were produced from the sample trace
- Each item's content — expand an item to see all its fields
- Validation warnings — if a required field is missing or a field type is wrong, it appears here
Check that:
- The number of items matches the number of LLM calls you expect
- Each item has a non-empty input and output
- The item type (
conversationaloragent) matches your use case
Improving the mapper
If the preview does not look right, use Improve Code rather than editing the code manually.
Examples of useful "Improve Code" prompts:
- "The dataset only shows one item per trace. I have a chatbot with multiple turns — each turn should be its own item."
- "Items are missing the conversation history. Include all prior turns in the conversation context field."
- "The item type is set to 'agent' but this is a conversational chatbot. Set it to 'conversational'."
- "The output field is empty. The LLM's response is in the
output.response.contentfield of the span."
Common problems and how to describe them
| What you see | What to tell the AI mapper |
|---|---|
| Dataset shows 1 item per trace instead of one per LLM call | "Generate a unique item for each LLM turn. Each turn should be its own separate item." |
| Items are missing the system prompt | "Include the system prompt from the span attributes in each item." |
| Input field is empty | "The user's message is in [path to field]. Map it to the input field." |
| Conversational scorers are not showing up | "This is a conversational chatbot — set the item type to conversational." |
| Items look the same across different conversations | "Each item should have the conversation history from that specific trace, not a shared one." |
After the mapper is set up
Once the mapper is saved, it runs automatically on every new trace the ETL job processes. You do not need to regenerate it unless your trace schema changes.
To see items being created:
- Go to the Dataset linked to this ETL job
- Check the Items tab — new items appear as traces are processed
- If items stop appearing, go back to the Mapper tab and click Run against a recent trace to verify the mapper still works
Next steps
- ETL Jobs Overview — full guide to creating and running ETL jobs
- What is a Dataset — understanding the items the mapper produces
- Running Evaluations — scoring your dataset items with NovaEval
- The Evaluation Pipeline — the full end-to-end flow
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.