Documentation

Using the AI Mapper

How to generate, review, and improve the mapper code that converts your traces into evaluation items — all from the Noveum UI.

What the AI mapper does

When you create an ETL Job, Noveum includes a Mapper tab where you define how your traces get converted into dataset items.

Clicking Generate Code in the Mapper tab tells Noveum to read a sample of your recent traces and write mapper code for you automatically. The code appears in an editor where you can review it, tweak it, and then apply it to the job.

You never need to write this code from scratch — the AI handles it. Your job is to review the output, check that items look right, and use Improve Code if anything needs adjusting.


The key concept: one item per LLM call

Every LLM call in a trace becomes one dataset item.

If your agent handles a user conversation with 5 back-and-forth turns, that trace contains 5 LLM calls. The mapper extracts each one into its own item, so NovaEval can score every individual decision your agent made — not just the final answer.

Example:

TraceLLM callsDataset items created
A chatbot conversation with 5 turns55
A RAG pipeline (retrieve + generate)22
A single-turn question-answer agent11
A multi-step research agent with 8 LLM calls8+8+

This granularity is what makes evaluation useful. Instead of knowing "my agent is 70% accurate overall," you can see exactly which type of question it gets wrong, at which turn, with which system prompt.


What each item contains

After the mapper runs, open your Dataset to inspect the items. Each item contains the fields the scorers need to evaluate it:

FieldWhat it represents
InputThe user's question or request for this turn
OutputWhat the LLM responded
System promptThe instructions the LLM was operating under
Conversation historyAll prior turns (important for conversational scorers)
Tool callsAny tools the LLM invoked and the results they returned
Item typeconversational or agent — controls which scorers are available

Generating mapper code (step by step)

Open your ETL Job and click the Mapper tab.
Select the Test Trace you want to use as a sample — pick a representative trace from your project.
Click Generate Code. Noveum reads the trace and writes the mapper code. This takes a few seconds.
Review the generated code in the editor. You will see two sections: one that reads a single span and extracts the fields, and one that assembles those fields into dataset items.
Click Apply to Editor to load the generated code into the job's mapper.
Click Run to test the mapper against your sample trace. The preview panel shows the items that would be produced.
If the items look correct, click Save to activate the mapper on all future traces processed by this ETL job.

Reviewing the output

After clicking Run, the preview panel shows:

  • How many items were produced from the sample trace
  • Each item's content — expand an item to see all its fields
  • Validation warnings — if a required field is missing or a field type is wrong, it appears here

Check that:

  1. The number of items matches the number of LLM calls you expect
  2. Each item has a non-empty input and output
  3. The item type (conversational or agent) matches your use case

Improving the mapper

If the preview does not look right, use Improve Code rather than editing the code manually.

Click Improve Code in the mapper toolbar.
Describe the problem in plain English. Be specific about what is wrong and what you want instead.
Click Apply. The AI rewrites the relevant part of the mapper and shows you the updated code.
Click Run again to verify the fix, then Save when satisfied.

Examples of useful "Improve Code" prompts:

  • "The dataset only shows one item per trace. I have a chatbot with multiple turns — each turn should be its own item."
  • "Items are missing the conversation history. Include all prior turns in the conversation context field."
  • "The item type is set to 'agent' but this is a conversational chatbot. Set it to 'conversational'."
  • "The output field is empty. The LLM's response is in the output.response.content field of the span."

Common problems and how to describe them

What you seeWhat to tell the AI mapper
Dataset shows 1 item per trace instead of one per LLM call"Generate a unique item for each LLM turn. Each turn should be its own separate item."
Items are missing the system prompt"Include the system prompt from the span attributes in each item."
Input field is empty"The user's message is in [path to field]. Map it to the input field."
Conversational scorers are not showing up"This is a conversational chatbot — set the item type to conversational."
Items look the same across different conversations"Each item should have the conversation history from that specific trace, not a shared one."

After the mapper is set up

Once the mapper is saved, it runs automatically on every new trace the ETL job processes. You do not need to regenerate it unless your trace schema changes.

To see items being created:

  1. Go to the Dataset linked to this ETL job
  2. Check the Items tab — new items appear as traces are processed
  3. If items stop appearing, go back to the Mapper tab and click Run against a recent trace to verify the mapper still works

Next steps

Exclusive Early Access

Get Early Access to Noveum.ai Platform

Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.

Sign up now. We send access to new batch every week.

Early access members receive premium onboarding support and influence our product roadmap. Limited spots available.

On this page

Using the AI Mapper | Documentation | Noveum.ai