StandardData Schema

Complete field reference for the StandardData model — the unified evaluation item format used across all NovaEval scorers.

Overview

StandardData is the canonical Pydantic model that every evaluation item must conform to. It uses flexible coercion so that messy trace payloads (JSON strings, numeric IDs, aliased field names) automatically normalize into the correct Python types. All NovaEval scorers read from this schema, so knowing which fields to populate determines which scorers can run.

Field Groups

Identifiers

Field	Type	Description
`user_id`	`str`	Unique identifier for the end user
`task_id`	`str`	Unique identifier for this trace / task run
`turn_id`	`str`	Unique identifier for a specific span or turn
`source_trace_id`	`str`	Links the dataset item back to the originating trace in the traces table

QA / Task

Field	Type	Description
`agent_task`	`str`	The input question, task description, or user message. Synced with `input_text`.
`input_text`	`str`	Alias for `agent_task`. Both are kept in sync automatically.
`agent_response`	`str`	The model's output. Synced with `output_text`.
`output_text`	`str`	Alias for `agent_response`. Both are kept in sync automatically.
`ground_truth`	`str`	The expected correct answer for accuracy evaluation
`novaeval_item_type`	`"agent"` \| `"conversational"`	Tells scorers which evaluation mode to apply

Agent

Field	Type	Description
`agent_name`	`str`	Name of the AI agent
`agent_role`	`str`	Role description (e.g., "customer support agent", "coding assistant")
`system_prompt`	`str`	The system prompt provided to the model
`trace`	`list[dict]`	Full trace data for multi-step evaluation (goal achievement, coherence)
`exit_status`	`str`	How the agent exited (e.g., `"success"`, `"timeout"`, `"error"`)
`agent_exit`	`bool`	Whether the agent has fully exited. Required by `goal_achievement_scorer` and `conversation_coherence_scorer`.

Tools

Field	Type	Description
`tools_available`	`list[ToolSchema]`	Tools the agent had access to (name, description, args/return schema)
`tool_calls`	`list[ToolCall]`	Tool calls actually made (tool_name, parameters, call_id)
`parameters_passed`	`dict`	Parameters passed in the most recent tool call
`tool_call_results`	`list[ToolResult]`	Results returned from tool calls (call_id, result, success, error_message)
`expected_tool_call`	`ToolCall`	The expected tool call for correctness evaluation

Required by scorers:

tool_correctness_scorer → expected_tool_call, tool_calls
parameter_correctness_scorer → tool_calls, parameters_passed, tool_call_results

Retrieval / RAG

Field	Type	Description
`retrieval_query`	`list[str]`	Queries sent to the vector database
`retrieved_context`	`list[list[str]]`	Retrieved chunks for each query (outer list = queries, inner list = K nearest neighbors)

Required by scorers:

answer_relevancy_scorer → agent_task, agent_response
faithfulness_scorer → retrieved_context, agent_response
contextual_precision_scorer / contextual_recall_scorer → retrieval_query, retrieved_context, ground_truth

Conversational

Field	Type	Description
`conversation_context`	`Conversation`	Full multi-turn conversation history
`speaker`	`str`	Speaker label for this turn (`"user"`, `"assistant"`, `"system"`)
`message`	`str`	Message content for this turn
`turn_position`	`int`	Position of this turn in the conversation (0-indexed)
`context`	`str` \| `dict`	Additional scorer-specific context

`Conversation` model

Conversation:
  turns: list[ConversationTurn]  # ordered list of turns
  context: str                   # system-level context / topic
  topic: str                     # conversation topic
  metadata: dict
 
ConversationTurn:
  speaker: str    # "user" | "assistant" | "system"
  message: str    # message text
  timestamp: str  # ISO timestamp (optional)
  metadata: dict

Required by scorers:

conversation_relevancy_scorer → conversation_context, agent_response
intention_fulfillment_scorer → conversation_context, agent_task, agent_response
knowledge_retention_scorer → conversation_context
role_adherence_scorer → agent_role, conversation_context, agent_response
instruction_adherence_scorer (telephony) → conversation_context, system_prompt
sentiment_csat_scorer (telephony) → conversation_context

Format Validation

Field	Type	Description
`expected_format`	`str`	Description of the expected output format (e.g., "JSON", "markdown table")
`extracted_content`	`str`	Content extracted from the response for format checking

Voice / Audio

These fields are populated by LiveKit, Pipecat, or NovaSynth integrations.

Field	Type	Description
`stt_data`	`STTData` \| `dict[str, STTData]`	Speech-to-text data — single object, or per-turn map (e.g., `{"turn_1": {...}, "turn_2": {...}}`)
`tts_data`	`TTSData` \| `dict[str, TTSData]`	Text-to-speech synthesis data — same flexible format as `stt_data`
`raw_complete_audio`	`RawCompleteAudio`	Complete session audio metadata
`latency`	`dict`	Aggregated latency metrics for the full session

`STTData` fields

STTData:
  provider: str         # "deepgram", "whisper", etc.
  model: str            # model identifier
  transcript: str       # transcribed text
  confidence: float     # 0.0–1.0
  is_final: bool        # whether this is the final (not partial) transcript
  mode: str             # "streaming" or "batch"
  event_type: str       # "final_transcript", etc.
  audio_duration_ms: float
  language: str         # "en-US", "es-ES", etc.
  audio_url: str        # pre-signed URL to the audio file
  span_start_time: str  # ISO timestamp
  span_end_time: str
  span_duration_ms: float

`TTSData` fields

TTSData:
  provider: str           # "elevenlabs", "azure", "google", etc.
  model: str
  input_text: str         # text that was synthesized
  audio_duration_ms: float
  audio_url: str          # pre-signed URL to the audio file
  span_start_time: str    # ISO timestamp
  span_end_time: str
  span_duration_ms: float
  speed: float            # speech speed multiplier (optional)
  pitch: float            # pitch adjustment (optional)

`RawCompleteAudio` fields

RawCompleteAudio:
  audio_duration: float      # total session duration in seconds
  audio_uuid: str
  sample_rate: int
  channels: int
  format: str                # "wav", "mp3", etc.
  complete_transcript: str   # full session transcript
  audio_url: str             # pre-signed URL
  timeline: dict             # speaker timeline

Metrics

Field	Type	Description
`metrics_collected`	`list[VADMetrics \| STTMetrics \| TTSMetrics \| LLMMetrics \| EOUMetrics]`	Flat list of per-component metrics. Also accepts a per-turn map `{turn_k: [metric, ...]}` which is normalized to a flat list with `label` set to the turn key.

The list can contain any mix of:

VADMetrics   — voice activity detection (idle_time, inference_count, speech_detected)
STTMetrics   — speech-to-text (duration, audio_duration, transcript, confidence)
TTSMetrics   — text-to-speech (ttfb, duration, audio_duration, speaking_duration, cancelled)
LLMMetrics   — language model (ttft, duration, completion_tokens, prompt_tokens, tokens_per_second)
EOUMetrics   — end-of-utterance (end_of_utterance_delay, transcription_delay, on_user_turn_completed_delay)

These are the source data for all latency and audio quality scorers.

Metadata

Field	Type	Description
`metadata`	`dict`	Arbitrary key-value pairs for custom attributes, tags, or context

Aliases

The schema automatically remaps these common field names so older or differently-structured payloads validate without modification:

Incoming field	Maps to
`trace_data`	`trace`
`expected_output`	`ground_truth`
`item_type`	`novaeval_item_type`
`conversation`	`conversation_context`
`question`	`agent_task` + `input_text`
`query`	`agent_task` + `input_text`
`answer`	`agent_response` + `output_text`

Scorer-to-field mapping

Scorer	Required fields
`accuracy` / `exact_match` / `f1`	`agent_response`, `ground_truth`
`answer_relevancy`	`agent_task`, `agent_response`
`faithfulness`	`retrieved_context`, `agent_response`
`contextual_precision` / `contextual_recall`	`retrieval_query`, `retrieved_context`, `ground_truth`
`hallucination_detection` / `claim_verification` / `factual_accuracy`	`agent_task`, `agent_response`
`tool_correctness`	`expected_tool_call`, `tool_calls`
`parameter_correctness`	`tool_calls`, `parameters_passed`, `tool_call_results`
`task_progression`	`agent_task`, `agent_role`, `system_prompt`, `agent_response`
`context_relevancy`	`agent_task`, `agent_role`, `agent_response`
`agent_role_adherence`	`agent_role`, `agent_task`, `agent_response`, `tool_calls`
`goal_achievement` / `conversation_coherence`	`agent_exit` (=True), `trace`
`conversation_relevancy` / `intention_fulfillment`	`conversation_context`, `agent_task`, `agent_response`
`knowledge_retention` / `role_adherence`	`agent_role`, `conversation_context`, `agent_response`
`mos` / `tone_clarity` / `pronunciation_audio` / `gibberish` / `audio_breakage`	`raw_complete_audio`
`word_accuracy`	`tts_data`, `raw_complete_audio`
`assistant_average_pitch_hz` / `assistant_volume_rms`	`raw_complete_audio`
`assistant_latency` / `llm_ttft` / `llm_latency` / `e2e_latency`	`metrics_collected`
`stt_latency` / `stt_audio_duration` / `stt_processing_duration`	`metrics_collected` (STTMetrics)
`tts_latency` / `tts_ttfb` / `tts_duration` / `tts_audio_duration`	`metrics_collected` (TTSMetrics)
`end_of_turn_delay` / `on_user_turn_completed_delay`	`metrics_collected` (EOUMetrics)
`instruction_adherence` / `sentiment_csat` / `drop_off_node`	`conversation_context`, `system_prompt`
`conversation_context_coherence` / `appropriate_call_termination`	`conversation_context`
`g_eval` / `panel_judge`	`agent_task`, `agent_response` (+ optional `context`)
`custom_scorer`	Any fields referenced in the template

Example item

{
  "novaeval_item_type": "agent",
  "agent_name": "support-bot",
  "agent_role": "Customer support specialist",
  "agent_task": "How do I cancel my subscription?",
  "agent_response": "You can cancel your subscription from Settings > Billing > Cancel plan.",
  "ground_truth": "Settings > Billing > Cancel plan",
  "system_prompt": "You are a helpful customer support agent for Acme Corp.",
  "tool_calls": [],
  "retrieval_query": ["cancel subscription"],
  "retrieved_context": [["To cancel, go to Settings > Billing and click Cancel plan."]],
  "source_trace_id": "trace_abc123",
  "metadata": { "environment": "production", "model": "gpt-4o" }
}

Next steps

ETL Jobs — how StandardData mappers are generated automatically
Scorers Reference — every scorer with its required fields
Running Evaluations — create and run an eval job

StandardData Schema

Overview

Field Groups

Identifiers

QA / Task

Agent

Tools

Retrieval / RAG

Conversational

`Conversation` model

Format Validation

Voice / Audio

`STTData` fields

`TTSData` fields

`RawCompleteAudio` fields

Metrics

Metadata

Aliases

Scorer-to-field mapping

Example item

Next steps

Get Early Access to Noveum.ai Platform

On this page