Documentation

StandardData Schema

Complete field reference for the StandardData model — the unified evaluation item format used across all NovaEval scorers.

Overview

StandardData is the canonical Pydantic model that every evaluation item must conform to. It uses flexible coercion so that messy trace payloads (JSON strings, numeric IDs, aliased field names) automatically normalize into the correct Python types. All NovaEval scorers read from this schema, so knowing which fields to populate determines which scorers can run.


Field Groups

Identifiers

FieldTypeDescription
user_idstrUnique identifier for the end user
task_idstrUnique identifier for this trace / task run
turn_idstrUnique identifier for a specific span or turn
source_trace_idstrLinks the dataset item back to the originating trace in the traces table

QA / Task

FieldTypeDescription
agent_taskstrThe input question, task description, or user message. Synced with input_text.
input_textstrAlias for agent_task. Both are kept in sync automatically.
agent_responsestrThe model's output. Synced with output_text.
output_textstrAlias for agent_response. Both are kept in sync automatically.
ground_truthstrThe expected correct answer for accuracy evaluation
novaeval_item_type"agent" | "conversational"Tells scorers which evaluation mode to apply

Agent

FieldTypeDescription
agent_namestrName of the AI agent
agent_rolestrRole description (e.g., "customer support agent", "coding assistant")
system_promptstrThe system prompt provided to the model
tracelist[dict]Full trace data for multi-step evaluation (goal achievement, coherence)
exit_statusstrHow the agent exited (e.g., "success", "timeout", "error")
agent_exitboolWhether the agent has fully exited. Required by goal_achievement_scorer and conversation_coherence_scorer.

Tools

FieldTypeDescription
tools_availablelist[ToolSchema]Tools the agent had access to (name, description, args/return schema)
tool_callslist[ToolCall]Tool calls actually made (tool_name, parameters, call_id)
parameters_passeddictParameters passed in the most recent tool call
tool_call_resultslist[ToolResult]Results returned from tool calls (call_id, result, success, error_message)
expected_tool_callToolCallThe expected tool call for correctness evaluation

Required by scorers:

  • tool_correctness_scorerexpected_tool_call, tool_calls
  • parameter_correctness_scorertool_calls, parameters_passed, tool_call_results

Retrieval / RAG

FieldTypeDescription
retrieval_querylist[str]Queries sent to the vector database
retrieved_contextlist[list[str]]Retrieved chunks for each query (outer list = queries, inner list = K nearest neighbors)

Required by scorers:

  • answer_relevancy_scoreragent_task, agent_response
  • faithfulness_scorerretrieved_context, agent_response
  • contextual_precision_scorer / contextual_recall_scorerretrieval_query, retrieved_context, ground_truth

Conversational

FieldTypeDescription
conversation_contextConversationFull multi-turn conversation history
speakerstrSpeaker label for this turn ("user", "assistant", "system")
messagestrMessage content for this turn
turn_positionintPosition of this turn in the conversation (0-indexed)
contextstr | dictAdditional scorer-specific context

Conversation model

Conversation:
  turns: list[ConversationTurn]  # ordered list of turns
  context: str                   # system-level context / topic
  topic: str                     # conversation topic
  metadata: dict
 
ConversationTurn:
  speaker: str    # "user" | "assistant" | "system"
  message: str    # message text
  timestamp: str  # ISO timestamp (optional)
  metadata: dict

Required by scorers:

  • conversation_relevancy_scorerconversation_context, agent_response
  • intention_fulfillment_scorerconversation_context, agent_task, agent_response
  • knowledge_retention_scorerconversation_context
  • role_adherence_scoreragent_role, conversation_context, agent_response
  • instruction_adherence_scorer (telephony) → conversation_context, system_prompt
  • sentiment_csat_scorer (telephony) → conversation_context

Format Validation

FieldTypeDescription
expected_formatstrDescription of the expected output format (e.g., "JSON", "markdown table")
extracted_contentstrContent extracted from the response for format checking

Voice / Audio

These fields are populated by LiveKit, Pipecat, or NovaSynth integrations.

FieldTypeDescription
stt_dataSTTData | dict[str, STTData]Speech-to-text data — single object, or per-turn map (e.g., {"turn_1": {...}, "turn_2": {...}})
tts_dataTTSData | dict[str, TTSData]Text-to-speech synthesis data — same flexible format as stt_data
raw_complete_audioRawCompleteAudioComplete session audio metadata
latencydictAggregated latency metrics for the full session

STTData fields

STTData:
  provider: str         # "deepgram", "whisper", etc.
  model: str            # model identifier
  transcript: str       # transcribed text
  confidence: float     # 0.0–1.0
  is_final: bool        # whether this is the final (not partial) transcript
  mode: str             # "streaming" or "batch"
  event_type: str       # "final_transcript", etc.
  audio_duration_ms: float
  language: str         # "en-US", "es-ES", etc.
  audio_url: str        # pre-signed URL to the audio file
  span_start_time: str  # ISO timestamp
  span_end_time: str
  span_duration_ms: float

TTSData fields

TTSData:
  provider: str           # "elevenlabs", "azure", "google", etc.
  model: str
  input_text: str         # text that was synthesized
  audio_duration_ms: float
  audio_url: str          # pre-signed URL to the audio file
  span_start_time: str    # ISO timestamp
  span_end_time: str
  span_duration_ms: float
  speed: float            # speech speed multiplier (optional)
  pitch: float            # pitch adjustment (optional)

RawCompleteAudio fields

RawCompleteAudio:
  audio_duration: float      # total session duration in seconds
  audio_uuid: str
  sample_rate: int
  channels: int
  format: str                # "wav", "mp3", etc.
  complete_transcript: str   # full session transcript
  audio_url: str             # pre-signed URL
  timeline: dict             # speaker timeline

Metrics

FieldTypeDescription
metrics_collectedlist[VADMetrics | STTMetrics | TTSMetrics | LLMMetrics | EOUMetrics]Flat list of per-component metrics. Also accepts a per-turn map {turn_k: [metric, ...]} which is normalized to a flat list with label set to the turn key.

The list can contain any mix of:

VADMetrics   — voice activity detection (idle_time, inference_count, speech_detected)
STTMetrics   — speech-to-text (duration, audio_duration, transcript, confidence)
TTSMetrics   — text-to-speech (ttfb, duration, audio_duration, speaking_duration, cancelled)
LLMMetrics   — language model (ttft, duration, completion_tokens, prompt_tokens, tokens_per_second)
EOUMetrics   — end-of-utterance (end_of_utterance_delay, transcription_delay, on_user_turn_completed_delay)

These are the source data for all latency and audio quality scorers.


Metadata

FieldTypeDescription
metadatadictArbitrary key-value pairs for custom attributes, tags, or context

Aliases

The schema automatically remaps these common field names so older or differently-structured payloads validate without modification:

Incoming fieldMaps to
trace_datatrace
expected_outputground_truth
item_typenovaeval_item_type
conversationconversation_context
questionagent_task + input_text
queryagent_task + input_text
answeragent_response + output_text

Scorer-to-field mapping

ScorerRequired fields
accuracy / exact_match / f1agent_response, ground_truth
answer_relevancyagent_task, agent_response
faithfulnessretrieved_context, agent_response
contextual_precision / contextual_recallretrieval_query, retrieved_context, ground_truth
hallucination_detection / claim_verification / factual_accuracyagent_task, agent_response
tool_correctnessexpected_tool_call, tool_calls
parameter_correctnesstool_calls, parameters_passed, tool_call_results
task_progressionagent_task, agent_role, system_prompt, agent_response
context_relevancyagent_task, agent_role, agent_response
agent_role_adherenceagent_role, agent_task, agent_response, tool_calls
goal_achievement / conversation_coherenceagent_exit (=True), trace
conversation_relevancy / intention_fulfillmentconversation_context, agent_task, agent_response
knowledge_retention / role_adherenceagent_role, conversation_context, agent_response
mos / tone_clarity / pronunciation_audio / gibberish / audio_breakageraw_complete_audio
word_accuracytts_data, raw_complete_audio
assistant_average_pitch_hz / assistant_volume_rmsraw_complete_audio
assistant_latency / llm_ttft / llm_latency / e2e_latencymetrics_collected
stt_latency / stt_audio_duration / stt_processing_durationmetrics_collected (STTMetrics)
tts_latency / tts_ttfb / tts_duration / tts_audio_durationmetrics_collected (TTSMetrics)
end_of_turn_delay / on_user_turn_completed_delaymetrics_collected (EOUMetrics)
instruction_adherence / sentiment_csat / drop_off_nodeconversation_context, system_prompt
conversation_context_coherence / appropriate_call_terminationconversation_context
g_eval / panel_judgeagent_task, agent_response (+ optional context)
custom_scorerAny fields referenced in the template

Example item

{
  "novaeval_item_type": "agent",
  "agent_name": "support-bot",
  "agent_role": "Customer support specialist",
  "agent_task": "How do I cancel my subscription?",
  "agent_response": "You can cancel your subscription from Settings > Billing > Cancel plan.",
  "ground_truth": "Settings > Billing > Cancel plan",
  "system_prompt": "You are a helpful customer support agent for Acme Corp.",
  "tool_calls": [],
  "retrieval_query": ["cancel subscription"],
  "retrieved_context": [["To cancel, go to Settings > Billing and click Cancel plan."]],
  "source_trace_id": "trace_abc123",
  "metadata": { "environment": "production", "model": "gpt-4o" }
}

Next steps

Exclusive Early Access

Get Early Access to Noveum.ai Platform

Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.

Sign up now. We send access to new batch every week.

Early access members receive premium onboarding support and influence our product roadmap. Limited spots available.

On this page

StandardData Schema | Documentation | Noveum.ai