StandardData Schema
Complete field reference for the StandardData model — the unified evaluation item format used across all NovaEval scorers.
Overview
StandardData is the canonical Pydantic model that every evaluation item must conform to. It uses flexible coercion so that messy trace payloads (JSON strings, numeric IDs, aliased field names) automatically normalize into the correct Python types. All NovaEval scorers read from this schema, so knowing which fields to populate determines which scorers can run.
Field Groups
Identifiers
| Field | Type | Description |
|---|---|---|
user_id | str | Unique identifier for the end user |
task_id | str | Unique identifier for this trace / task run |
turn_id | str | Unique identifier for a specific span or turn |
source_trace_id | str | Links the dataset item back to the originating trace in the traces table |
QA / Task
| Field | Type | Description |
|---|---|---|
agent_task | str | The input question, task description, or user message. Synced with input_text. |
input_text | str | Alias for agent_task. Both are kept in sync automatically. |
agent_response | str | The model's output. Synced with output_text. |
output_text | str | Alias for agent_response. Both are kept in sync automatically. |
ground_truth | str | The expected correct answer for accuracy evaluation |
novaeval_item_type | "agent" | "conversational" | Tells scorers which evaluation mode to apply |
Agent
| Field | Type | Description |
|---|---|---|
agent_name | str | Name of the AI agent |
agent_role | str | Role description (e.g., "customer support agent", "coding assistant") |
system_prompt | str | The system prompt provided to the model |
trace | list[dict] | Full trace data for multi-step evaluation (goal achievement, coherence) |
exit_status | str | How the agent exited (e.g., "success", "timeout", "error") |
agent_exit | bool | Whether the agent has fully exited. Required by goal_achievement_scorer and conversation_coherence_scorer. |
Tools
| Field | Type | Description |
|---|---|---|
tools_available | list[ToolSchema] | Tools the agent had access to (name, description, args/return schema) |
tool_calls | list[ToolCall] | Tool calls actually made (tool_name, parameters, call_id) |
parameters_passed | dict | Parameters passed in the most recent tool call |
tool_call_results | list[ToolResult] | Results returned from tool calls (call_id, result, success, error_message) |
expected_tool_call | ToolCall | The expected tool call for correctness evaluation |
Required by scorers:
tool_correctness_scorer→expected_tool_call,tool_callsparameter_correctness_scorer→tool_calls,parameters_passed,tool_call_results
Retrieval / RAG
| Field | Type | Description |
|---|---|---|
retrieval_query | list[str] | Queries sent to the vector database |
retrieved_context | list[list[str]] | Retrieved chunks for each query (outer list = queries, inner list = K nearest neighbors) |
Required by scorers:
answer_relevancy_scorer→agent_task,agent_responsefaithfulness_scorer→retrieved_context,agent_responsecontextual_precision_scorer/contextual_recall_scorer→retrieval_query,retrieved_context,ground_truth
Conversational
| Field | Type | Description |
|---|---|---|
conversation_context | Conversation | Full multi-turn conversation history |
speaker | str | Speaker label for this turn ("user", "assistant", "system") |
message | str | Message content for this turn |
turn_position | int | Position of this turn in the conversation (0-indexed) |
context | str | dict | Additional scorer-specific context |
Conversation model
Required by scorers:
conversation_relevancy_scorer→conversation_context,agent_responseintention_fulfillment_scorer→conversation_context,agent_task,agent_responseknowledge_retention_scorer→conversation_contextrole_adherence_scorer→agent_role,conversation_context,agent_responseinstruction_adherence_scorer(telephony) →conversation_context,system_promptsentiment_csat_scorer(telephony) →conversation_context
Format Validation
| Field | Type | Description |
|---|---|---|
expected_format | str | Description of the expected output format (e.g., "JSON", "markdown table") |
extracted_content | str | Content extracted from the response for format checking |
Voice / Audio
These fields are populated by LiveKit, Pipecat, or NovaSynth integrations.
| Field | Type | Description |
|---|---|---|
stt_data | STTData | dict[str, STTData] | Speech-to-text data — single object, or per-turn map (e.g., {"turn_1": {...}, "turn_2": {...}}) |
tts_data | TTSData | dict[str, TTSData] | Text-to-speech synthesis data — same flexible format as stt_data |
raw_complete_audio | RawCompleteAudio | Complete session audio metadata |
latency | dict | Aggregated latency metrics for the full session |
STTData fields
TTSData fields
RawCompleteAudio fields
Metrics
| Field | Type | Description |
|---|---|---|
metrics_collected | list[VADMetrics | STTMetrics | TTSMetrics | LLMMetrics | EOUMetrics] | Flat list of per-component metrics. Also accepts a per-turn map {turn_k: [metric, ...]} which is normalized to a flat list with label set to the turn key. |
The list can contain any mix of:
These are the source data for all latency and audio quality scorers.
Metadata
| Field | Type | Description |
|---|---|---|
metadata | dict | Arbitrary key-value pairs for custom attributes, tags, or context |
Aliases
The schema automatically remaps these common field names so older or differently-structured payloads validate without modification:
| Incoming field | Maps to |
|---|---|
trace_data | trace |
expected_output | ground_truth |
item_type | novaeval_item_type |
conversation | conversation_context |
question | agent_task + input_text |
query | agent_task + input_text |
answer | agent_response + output_text |
Scorer-to-field mapping
| Scorer | Required fields |
|---|---|
accuracy / exact_match / f1 | agent_response, ground_truth |
answer_relevancy | agent_task, agent_response |
faithfulness | retrieved_context, agent_response |
contextual_precision / contextual_recall | retrieval_query, retrieved_context, ground_truth |
hallucination_detection / claim_verification / factual_accuracy | agent_task, agent_response |
tool_correctness | expected_tool_call, tool_calls |
parameter_correctness | tool_calls, parameters_passed, tool_call_results |
task_progression | agent_task, agent_role, system_prompt, agent_response |
context_relevancy | agent_task, agent_role, agent_response |
agent_role_adherence | agent_role, agent_task, agent_response, tool_calls |
goal_achievement / conversation_coherence | agent_exit (=True), trace |
conversation_relevancy / intention_fulfillment | conversation_context, agent_task, agent_response |
knowledge_retention / role_adherence | agent_role, conversation_context, agent_response |
mos / tone_clarity / pronunciation_audio / gibberish / audio_breakage | raw_complete_audio |
word_accuracy | tts_data, raw_complete_audio |
assistant_average_pitch_hz / assistant_volume_rms | raw_complete_audio |
assistant_latency / llm_ttft / llm_latency / e2e_latency | metrics_collected |
stt_latency / stt_audio_duration / stt_processing_duration | metrics_collected (STTMetrics) |
tts_latency / tts_ttfb / tts_duration / tts_audio_duration | metrics_collected (TTSMetrics) |
end_of_turn_delay / on_user_turn_completed_delay | metrics_collected (EOUMetrics) |
instruction_adherence / sentiment_csat / drop_off_node | conversation_context, system_prompt |
conversation_context_coherence / appropriate_call_termination | conversation_context |
g_eval / panel_judge | agent_task, agent_response (+ optional context) |
custom_scorer | Any fields referenced in the template |
Example item
Next steps
- ETL Jobs — how StandardData mappers are generated automatically
- Scorers Reference — every scorer with its required fields
- Running Evaluations — create and run an eval job
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.