Synthetic Voice Testing

Test your LiveKit voice agent at scale with NovaSynth — realistic AI-generated callers via LiveKit rooms, with automatic tracing, datasets, and NovaEval evaluations

NovaSynth joins your LiveKit room as a participant and conducts a realistic voice conversation with your agent. The synthetic caller is driven by a persona and scenario — your agent handles it as a normal LiveKit session. Your existing trace wrappers (LiveKitSTTWrapper, LiveKitTTSWrapper, setup_livekit_tracing) capture every STT, LLM, and TTS span automatically. The resulting traces are available in your Noveum dashboard for dataset creation and NovaEval evaluations.

Private Beta — NovaSynth synthetic voice testing is currently available to select customers. To enable it for your account, contact support@noveum.ai.

How It Works

┌──────────────────────────────────────────────────────────────┐
│  NovaSynth Synthetic Caller                                   │
│  Persona: goal, patience, tone, language, knowledge base      │
│  Scenario: conversation flow with fixed steps + branches      │
└────────────────────────┬─────────────────────────────────────┘
                         │  Real audio
              (LiveKit room join — NovaSynth as participant)
                         ↓
┌──────────────────────────────────────────────────────────────┐
│  Your LiveKit Voice Agent                                     │
│  AgentSession runs as normal                                  │
│  LiveKitSTTWrapper + LiveKitTTSWrapper capture every span     │
└────────────────────────┬─────────────────────────────────────┘
                         │  Traces
                         ↓
┌──────────────────────────────────────────────────────────────┐
│  Noveum Platform                                              │
│  Traces dashboard  →  per-turn audio, transcripts, latency   │
│  Datasets          →  curated trace collections              │
│  NovaEval          →  automated quality scoring + model      │
│                        comparison, regression detection       │
└──────────────────────────────────────────────────────────────┘

Why each layer matters:

Without the LiveKit trace wrappers: the session runs but produces no data.
Without datasets: traces exist but cannot be evaluated systematically.
Without NovaEval: no quality measurement, no regression detection, no model comparison.
With all three: every test run produces a scored, comparable, auditable result.

Before You Begin

Complete these steps before running synthetic tests.

Integrate Noveum Trace — follow the LiveKit Integration Overview and the Basic LiveKit Voice Agent guide. Confirm traces are appearing in your Noveum dashboard before proceeding.
Verify tracing works — run a few test conversations with your agent and confirm the traces appear in your dashboard with the expected STT, LLM, and TTS spans.
Use NovaSynth to build your initial dataset — run synthetic tests to generate your first batch of traced conversations. Create a dataset from those traces in the Noveum dashboard (Traces → select traces → Create Dataset).
Run a NovaEval evaluation — go to Evaluations → New Evaluation, select your dataset, and start an evaluation job to establish a quality baseline.

NovaSynth is designed to generate the conversations that make up your evaluation datasets — you do not need an existing dataset before your first run.

Step 1: Expose Your Agent's Audio Endpoint

NovaSynth joins your LiveKit room as a participant and speaks to your agent using AI-generated voice. Your agent processes it as a normal LiveKit job — no changes to your agent code are required.

# Your existing LiveKit agent setup — no changes needed
from noveum_trace.integrations.livekit import (
    LiveKitSTTWrapper,
    LiveKitTTSWrapper,
    setup_livekit_tracing,
    extract_job_context,
)
from livekit.agents import Agent, AgentSession, JobContext

This snippet shows the imports required for Noveum tracing — your LiveKit server URL, room credentials, and API key are configured in the Noveum dashboard, not in agent code.

Register your LiveKit server URL and API secret under Project Settings → Agent Endpoints in the Noveum dashboard. NovaSynth uses these credentials to generate room tokens and connect to your server.

Step 2: Create Personas

Note: All code and curl examples below use placeholder values — $NOVEUM_API_KEY, "my-voice-agent", "persona_abc123", "wss://yourapp.livekit.cloud", and other sample strings. Replace them with your actual API key, project name, IDs, and LiveKit server URL before running.

Personas can be created and managed from the Noveum dashboard under Synthetic Testing → Personas, or via the API examples below.

A persona is the synthetic caller's identity — who they are, how they speak, and what they want from the call. Realistic, diverse personas surface the failure modes that matter most.

Persona fields:

name, description — character identity
goal — what they want to accomplish in this session
patience_level — 0.0 (immediately frustrated) to 1.0 (very patient)
personality_traits — e.g. ["direct", "impatient", "tech-savvy"]
tone_preference — "casual", "formal", "friendly", "curt", "aggressive"
primary_language — supports multilingual callers, e.g. ["Hindi", "English"]
knowledge_base — what the caller already knows (menu items, pricing, account history)
Optional demographics: age, occupation, location

AI-generated personas (recommended)

Paste your agent's system prompt and Noveum generates a diverse set covering a range of personalities and goals automatically:

curl -X POST https://api.noveum.ai/v1/synthetic/personas/generate \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "system_prompt": "You are a friendly drive-thru order taker for BurgerPlace. Help customers place food and drink orders, answer questions about the menu, and confirm orders before finalizing.",
    "count": 5
  }'

Manual persona creation

Create specific personas to target known problem areas:

curl -X POST https://api.noveum.ai/v1/synthetic/personas \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "name": "Alex Chen",
    "description": "Impatient professional on a short lunch break",
    "goal": "Order a burger and fries as fast as possible",
    "patience_level": 0.2,
    "personality_traits": ["direct", "impatient", "efficiency-focused"],
    "tone_preference": "curt",
    "primary_language": ["English"],
    "knowledge_base": ["knows the full menu", "ordered here many times before"]
  }'

A contrasting example — patient first-timer who needs guidance:

{
  "project": "my-voice-agent",
  "name": "Meena Patel",
  "description": "Elderly first-time caller, unfamiliar with voice ordering",
  "goal": "Order food for a family of four",
  "patience_level": 0.9,
  "personality_traits": ["polite", "hesitant", "detail-oriented"],
  "tone_preference": "formal",
  "primary_language": ["English", "Gujarati"],
  "knowledge_base": ["has never ordered by voice before", "not sure what sizes are available"]
}

List all personas for a project:

curl "https://api.noveum.ai/v1/synthetic/personas?project=my-voice-agent" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Step 3: Create Scenarios

Scenarios can be created and managed from the Noveum dashboard under Synthetic Testing → Scenarios, or via the API examples below.

A scenario is the conversation plan. It defines what the caller wants to accomplish, in what order, and how the conversation branches based on agent responses.

Scenario structure:

name, description — what this scenario tests
events — a tree of conversation steps:
- id — unique step identifier
- parent_id — which step this follows (null for the opening step)
- action — what the synthetic caller does at this step
- condition — optional: this step only fires if this condition is met in the conversation
- fixed: true — this step always happens regardless of what the agent says

Steps with fixed: true form the backbone of the conversation. Steps with a condition create branches — the synthetic caller responds adaptively, the same way a real person would.

AI-generated scenarios (recommended)

curl -X POST https://api.noveum.ai/v1/synthetic/scenarios/generate \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "system_prompt": "You are a drive-thru order taker for BurgerPlace...",
    "count": 3,
    "focus": "include edge cases and failure modes"
  }'

Manual scenario — happy path

curl -X POST https://api.noveum.ai/v1/synthetic/scenarios \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "name": "Quick single-item order",
    "description": "Customer places one item and confirms immediately",
    "events": [
      { "id": "e1", "action": "Greet the agent and say you want to place an order", "fixed": true },
      { "id": "e2", "parent_id": "e1", "action": "Order one burger", "fixed": true },
      { "id": "e3", "parent_id": "e2", "condition": "agent asks about sides or drinks", "action": "Order fries, decline a drink" },
      { "id": "e4", "parent_id": "e2", "condition": "agent confirms the order total", "action": "Confirm and end the session" }
    ]
  }'

Manual scenario — edge case (unavailable item)

{
  "project": "my-voice-agent",
  "name": "Unavailable menu item",
  "description": "Customer asks for an item not on the menu and handles the agent's response",
  "events": [
    { "id": "e1", "action": "Ask for the spicy chicken sandwich", "fixed": true },
    { "id": "e2", "parent_id": "e1", "condition": "agent says it is unavailable", "action": "Express disappointment, ask what chicken options are available" },
    { "id": "e3", "parent_id": "e2", "action": "Order the closest alternative the agent suggests" },
    { "id": "e4", "parent_id": "e1", "condition": "agent confirms the item without flagging it as unavailable", "action": "Place the order and end the session" }
  ]
}

List all scenarios for a project:

curl "https://api.noveum.ai/v1/synthetic/scenarios?project=my-voice-agent" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Step 4: Trigger a Synthetic Test Run

With a persona, a scenario, and a registered LiveKit endpoint, trigger a run. Noveum's infrastructure handles everything from here — joining the room, driving the conversation, and capturing the trace. Runs can also be started from the Noveum dashboard → Synthetic Testing → New Run.

curl -X POST https://api.noveum.ai/v1/synthetic/runs \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "persona_id": "persona_abc123",
    "scenario_id": "scenario_def456",
    "agent_endpoint": {
      "type": "livekit",
      "livekit_url": "wss://yourapp.livekit.cloud",
      "room_name": "test-room-name"
    }
  }'

What happens after the request:

NovaSynth connects to your LiveKit server and joins the specified room as a participant.
The synthetic caller speaks naturally using the persona's voice, tone, and language, following the scenario's event tree.
Your LiveKit agent handles the session as a normal job — STT, LLM, and TTS run as normal.
LiveKitSTTWrapper, LiveKitTTSWrapper, and setup_livekit_tracing capture every span with full audio, transcripts, latency, and token data.
When the scenario completes or a natural goodbye occurs, the session ends.
The complete trace appears in your Noveum dashboard, tagged source: synthetic.

Step 5: Monitor a Run

curl "https://api.noveum.ai/v1/synthetic/runs/run_xyz789" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Response fields:

Field	Description
`id`	Run identifier
`status`	`pending` \| `running` \| `completed` \| `failed`
`trace_id`	Noveum trace ID for this run
`trace_url`	Direct link to the trace in the dashboard
`duration_seconds`	Total session duration
`turn_count`	Number of conversational turns
`persona.name`	Name of the persona used
`scenario.name`	Name of the scenario used

Runs are also visible in Noveum dashboard → Synthetic Testing → Runs.

Step 6: View Traces

Synthetic traces appear in the Traces section alongside real conversations. The schema is identical to a real session — synthetic traces can be filtered by their tags and mixed freely with real traces in datasets.

livekit.session
│  source: synthetic
│  synthetic.persona: "Alex Chen"
│  synthetic.scenario: "Quick single-item order"
│  livekit.job.id, livekit.room.name, livekit.participant.identity
│  session.turn_count, session.total_cost
│
├── livekit.stt × N
│   ├── stt.text, stt.is_final, stt.language
│   ├── stt.model, stt.confidence
│   ├── stt.vad_to_final_ms, stt.first_text_latency_ms
│   └── stt.audio_uuid  (when record=True)
│
├── livekit.llm × N
│   ├── llm.model, llm.system_prompt, llm.input, llm.output
│   ├── llm.input_tokens, llm.output_tokens, llm.total_tokens
│   ├── llm.cost.input, llm.cost.output, llm.cost.total
│   ├── llm.time_to_first_token_ms
│   └── llm.function_calls[], llm.function_call_results[]
│
└── livekit.tts × N
    ├── tts.input_text, tts.voice, tts.model
    ├── tts.time_to_first_byte_ms, tts.characters
    └── tts.audio_uuid  (when record=True)

To filter for synthetic traces in the dashboard: use the source: synthetic filter in the Traces view.

Step 7: Build Datasets and Run NovaEval Evaluations

Creating a dataset

Go to Traces in the Noveum dashboard.
Filter by source: synthetic and/or date range.
Select the traces to include.
Click Create Dataset.

You can mix synthetic and real traces in the same dataset. A dataset that combines both gives the most representative evaluation results.

Running a NovaEval evaluation

Go to Evaluations → New Evaluation.
Select your dataset.
Noveum's NovaEval engine recommends scorers based on your agent type. For voice agents, this includes:
- Conversational metrics: knowledge retention, conversation relevancy, role adherence
- Task completion: goal achievement, tool relevancy, task progression
- Quality: conversation completeness, response clarity
Optionally, select model variants to compare if you are evaluating a model swap.
Start the evaluation — Noveum runs it and presents results in the dashboard.

Results show per-scenario quality scores, aggregate metrics across the full dataset, and side-by-side model comparisons. Edge-case scenarios built with impatient or confused personas reveal exactly where your agent fails before real users do.

Batch Testing

Run every persona × scenario combination for full matrix coverage. After all runs complete, select the traces, create a single dataset, and run one NovaEval evaluation across the entire matrix.

import itertools
import os
 
import requests
 
BASE             = "https://api.noveum.ai/v1"
NOVEUM_API_KEY   = os.environ["NOVEUM_API_KEY"]
HEADERS          = {"Authorization": f"Bearer {NOVEUM_API_KEY}", "Content-Type": "application/json"}
PROJECT          = "my-voice-agent"
 
personas  = requests.get(
    f"{BASE}/synthetic/personas",
    params={"project": PROJECT},
    headers=HEADERS
).json()["personas"]
 
scenarios = requests.get(
    f"{BASE}/synthetic/scenarios",
    params={"project": PROJECT},
    headers=HEADERS
).json()["scenarios"]
 
runs = []
for persona, scenario in itertools.product(personas, scenarios):
    run = requests.post(f"{BASE}/synthetic/runs", json={
        "project":     PROJECT,
        "persona_id":  persona["id"],
        "scenario_id": scenario["id"],
        "agent_endpoint": {
            "type":        "livekit",
            "livekit_url": "wss://yourapp.livekit.cloud",  # replace with your LiveKit server URL
            "room_name":   "test-room-name"                # replace with your room name
        }
    }, headers=HEADERS).json()
    runs.append(run)
    print(f"  {persona['name']:25s}  ×  {scenario['name']:30s}  →  {run['id']}")
 
print(f"\nStarted {len(runs)} runs  ({len(personas)} personas × {len(scenarios)} scenarios)")

Best Practices

Start with 3–5 personas covering a spread: a patient happy-path user, an impatient expert, a confused first-timer, and one multilingual user if your agent serves mixed-language callers.
Create at least one edge-case scenario per core conversation flow. Happy paths pass by definition — edge cases are where agents break.
Use patience_level: 0.1–0.3 to stress-test. Impatient callers expose slow response times, rambling answers, and goal-completion failures.
Re-run the exact same persona × scenario matrix after every model swap or prompt change. If the new configuration fails more scenarios than the previous one, it is not ready.
Keep record=True in setup_livekit_tracing. Audio recordings of synthetic sessions are invaluable for debugging — you can hear how the synthetic caller phrased a request and how your agent responded.
Tag batch-test datasets separately from production datasets. Evaluating a pure synthetic matrix is useful for regression testing; mixing a representative sample of real traces with synthetic ones gives a broader picture of overall agent health.
Use AI-generated personas and scenarios first to get broad coverage fast, then add manual entries to target specific failure modes you discover.

FAQ

Can I run synthetic testing without Noveum Trace integrated into my agent? No. NovaSynth joins your LiveKit room and conducts a real session, but without LiveKitSTTWrapper, LiveKitTTSWrapper, and setup_livekit_tracing active, the session produces no data. The trace is the entire output of a synthetic run — without it, the session simply disappears.

Do synthetic traces look different from real user traces? In structure, no. They use the same schema as real sessions. They carry source: synthetic, synthetic.persona, and synthetic.scenario attributes so you can filter them in the dashboard and keep them separate from production data when needed.

Can the synthetic caller handle interruptions? Yes. The synthetic caller delivers real audio into the LiveKit room, so any session behavior that depends on real-time audio — VAD, barge-in, end-of-utterance detection — works exactly as it would with a real user.

How long does a test run take? Typical voice agent conversations take 30–120 seconds. NovaSynth runs in real time — it joins a real LiveKit session and the conversation unfolds at the natural pace of speech.

How many runs can I trigger in parallel? Concurrent run limits depend on your plan. Contact support@noveum.ai for details.

Do I need an existing dataset before my first NovaSynth run? No. NovaSynth generates the conversations that become your dataset. Run your first synthetic tests, select the resulting traces in the Noveum dashboard, and create your dataset from there. You do not need to collect real user data before getting started.

How do I get access to NovaSynth synthetic testing? NovaSynth is currently in private beta and is enabled on request. Reach out to support@noveum.ai and we will enable it for your account.