Synthetic Voice Testing

Test your Pipecat voice agent at scale with NovaSynth — realistic AI-generated callers over your configured telephony or WebRTC transport, with automatic tracing, datasets, and NovaEval evaluations

NovaSynth connects to your Pipecat voice agent using your configured telephony or WebRTC transport and conducts a realistic voice conversation driven by an AI-generated persona and scenario. Your pipeline processes the interaction exactly as it would a real user. Your existing NoveumTraceObserver captures every STT, LLM, and TTS span — including audio recordings. The resulting traces are available in your Noveum dashboard for dataset creation and NovaEval evaluations.

Private Beta — NovaSynth synthetic voice testing is currently available to select customers. To enable it for your account, contact support@noveum.ai.

How It Works

┌──────────────────────────────────────────────────────────────┐
│  NovaSynth Synthetic Caller                                   │
│  Persona: goal, patience, tone, language, knowledge base      │
│  Scenario: conversation flow with fixed steps + branches      │
└────────────────────────┬─────────────────────────────────────┘
                         │  Real audio
              (via your registered telephony or WebRTC endpoint)
                         ↓
┌──────────────────────────────────────────────────────────────┐
│  Your Pipecat Voice Agent                                     │
│  STT → LLM → TTS pipeline runs as normal                     │
│  NoveumTraceObserver captures every span + audio             │
└────────────────────────┬─────────────────────────────────────┘
                         │  Traces
                         ↓
┌──────────────────────────────────────────────────────────────┐
│  Noveum Platform                                              │
│  Traces dashboard  →  per-turn audio, transcripts, latency   │
│  Datasets          →  curated trace collections              │
│  NovaEval          →  automated quality scoring + model      │
│                        comparison, regression detection       │
└──────────────────────────────────────────────────────────────┘

Why each layer matters:

Without NoveumTraceObserver: the call happens but produces no data.
Without datasets: traces exist but cannot be evaluated systematically.
Without NovaEval: no quality measurement, no regression detection, no model comparison.
With all three: every test run produces a scored, comparable, auditable result.

Before You Begin

Complete these steps before running synthetic tests.

Integrate Noveum Trace — follow the Pipecat Integration Overview and the Basic Pipecat Voice Pipeline guide. Confirm traces are appearing in your Noveum dashboard before proceeding.
Verify tracing works — place a few test calls to your agent and confirm the traces appear in your dashboard with the expected STT, LLM, and TTS spans.
Use NovaSynth to build your initial dataset — run synthetic tests to generate your first batch of traced conversations. Create a dataset from those traces in the Noveum dashboard (Traces → select traces → Create Dataset).
Run a NovaEval evaluation — go to Evaluations → New Evaluation, select your dataset, and start an evaluation job to establish a quality baseline.

NovaSynth is designed to generate the conversations that make up your evaluation datasets — you do not need an existing dataset before your first run.

Step 1: Register Your Agent Endpoint

How NovaSynth reaches your agent depends on the telephony or WebRTC transport your Pipecat pipeline uses. Register your endpoint under Project Settings → Agent Endpoints in the Noveum dashboard and select the provider that matches your setup. No changes to your agent code are required.

NovaSynth supports phone-based transports (Plivo, Twilio, and other SIP providers):

# Your existing transport — no changes needed
from pipecat.transports.services.plivo import PlivoParams, PlivoTransport
# or
from pipecat.transports.services.twilio import TwilioParams, TwilioTransport

If you use a different telephony or WebRTC provider, contact support@noveum.ai to confirm it is supported.

Step 2: Create Personas

Note: All code and curl examples below use placeholder values — $NOVEUM_API_KEY, "my-voice-agent", "persona_abc123", "+15550123456", and other sample strings. Replace them with your actual API key, project name, IDs, and phone number before running.

Personas can be created and managed from the Noveum dashboard under Synthetic Testing → Personas, or via the API examples below.

A persona is the synthetic caller's identity — who they are, how they speak, and what they want from the call. Realistic, diverse personas surface the failure modes that matter most.

Persona fields:

name, description — character identity
goal — what they want to accomplish on this call
patience_level — 0.0 (immediately frustrated) to 1.0 (very patient)
personality_traits — e.g. ["direct", "impatient", "tech-savvy"]
tone_preference — "casual", "formal", "friendly", "curt", "aggressive"
primary_language — supports multilingual callers, e.g. ["Hindi", "English"]
knowledge_base — what the caller already knows (menu items, pricing, account history)
Optional demographics: age, occupation, location

AI-generated personas (recommended)

Paste your agent's system prompt and Noveum generates a diverse set covering a range of personalities and goals automatically:

curl -X POST https://api.noveum.ai/v1/synthetic/personas/generate \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "system_prompt": "You are a friendly drive-thru order taker for BurgerPlace. Help customers place food and drink orders, answer questions about the menu, and confirm orders before finalizing.",
    "count": 5
  }'

Manual persona creation

Create specific personas to target known problem areas:

curl -X POST https://api.noveum.ai/v1/synthetic/personas \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "name": "Alex Chen",
    "description": "Impatient professional on a short lunch break",
    "goal": "Order a burger and fries as fast as possible",
    "patience_level": 0.2,
    "personality_traits": ["direct", "impatient", "efficiency-focused"],
    "tone_preference": "curt",
    "primary_language": ["English"],
    "knowledge_base": ["knows the full menu", "ordered here many times before"]
  }'

A contrasting example — patient first-timer who needs guidance:

{
  "project": "my-voice-agent",
  "name": "Meena Patel",
  "description": "Elderly first-time caller, unfamiliar with phone ordering",
  "goal": "Order food for a family of four",
  "patience_level": 0.9,
  "personality_traits": ["polite", "hesitant", "detail-oriented"],
  "tone_preference": "formal",
  "primary_language": ["English", "Gujarati"],
  "knowledge_base": ["has never ordered by phone before", "not sure what sizes are available"]
}

List all personas for a project:

curl "https://api.noveum.ai/v1/synthetic/personas?project=my-voice-agent" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Step 3: Create Scenarios

Scenarios can be created and managed from the Noveum dashboard under Synthetic Testing → Scenarios, or via the API examples below.

A scenario is the conversation plan. It defines what the caller wants to accomplish, in what order, and how the conversation branches based on agent responses.

Scenario structure:

name, description — what this scenario tests
events — a tree of conversation steps:
- id — unique step identifier
- parent_id — which step this follows (null for the opening step)
- action — what the synthetic caller does at this step
- condition — optional: this step only fires if this condition is met in the conversation
- fixed: true — this step always happens regardless of what the agent says

Steps with fixed: true form the backbone of the conversation. Steps with a condition create branches — the synthetic caller responds adaptively, the same way a real person would.

AI-generated scenarios (recommended)

curl -X POST https://api.noveum.ai/v1/synthetic/scenarios/generate \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "system_prompt": "You are a drive-thru order taker for BurgerPlace...",
    "count": 3,
    "focus": "include edge cases and failure modes"
  }'

Manual scenario — happy path

curl -X POST https://api.noveum.ai/v1/synthetic/scenarios \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "name": "Quick single-item order",
    "description": "Customer places one item and confirms immediately",
    "events": [
      { "id": "e1", "action": "Greet the agent and say you want to place an order", "fixed": true },
      { "id": "e2", "parent_id": "e1", "action": "Order one burger", "fixed": true },
      { "id": "e3", "parent_id": "e2", "condition": "agent asks about sides or drinks", "action": "Order fries, decline a drink" },
      { "id": "e4", "parent_id": "e2", "condition": "agent confirms the order total", "action": "Confirm and end the call" }
    ]
  }'

Manual scenario — edge case (unavailable item)

{
  "project": "my-voice-agent",
  "name": "Unavailable menu item",
  "description": "Customer asks for an item not on the menu and handles the agent's response",
  "events": [
    { "id": "e1", "action": "Ask for the spicy chicken sandwich", "fixed": true },
    { "id": "e2", "parent_id": "e1", "condition": "agent says it is unavailable", "action": "Express disappointment, ask what chicken options are available" },
    { "id": "e3", "parent_id": "e2", "action": "Order the closest alternative the agent suggests" },
    { "id": "e4", "parent_id": "e1", "condition": "agent confirms the item without flagging it as unavailable", "action": "Place the order and confirm" }
  ]
}

List all scenarios for a project:

curl "https://api.noveum.ai/v1/synthetic/scenarios?project=my-voice-agent" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Step 4: Trigger a Synthetic Test Run

With a persona, a scenario, and a registered endpoint, trigger a run. Noveum's infrastructure handles everything from here — connecting to your agent, driving the conversation, and capturing the trace. Runs can also be started from the Noveum dashboard → Synthetic Testing → New Run.

Phone number (Plivo)

curl -X POST https://api.noveum.ai/v1/synthetic/runs \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "persona_id": "persona_abc123",
    "scenario_id": "scenario_def456",
    "agent_endpoint": {
      "type": "phone",
      "provider": "plivo",
      "phone_number": "+15550123456"
    }
  }'

Phone number (Twilio)

curl -X POST https://api.noveum.ai/v1/synthetic/runs \
  -H "Authorization: Bearer $NOVEUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project": "my-voice-agent",
    "persona_id": "persona_abc123",
    "scenario_id": "scenario_def456",
    "agent_endpoint": {
      "type": "phone",
      "provider": "twilio",
      "phone_number": "+15550123456"
    }
  }'

What happens after the request:

NovaSynth connects to your registered agent endpoint.
The synthetic caller speaks naturally using the persona's voice, tone, and language, following the scenario's event tree.
Your Pipecat pipeline handles the call exactly as it would a real user — STT, LLM, and TTS run as normal.
NoveumTraceObserver captures every span with full audio, transcripts, latency, and token data.
When the scenario completes or a natural goodbye occurs, the call ends.
The complete trace appears in your Noveum dashboard, tagged source: synthetic.

Step 5: Monitor a Run

curl "https://api.noveum.ai/v1/synthetic/runs/run_xyz789" \
  -H "Authorization: Bearer $NOVEUM_API_KEY"

Response fields:

Field	Description
`id`	Run identifier
`status`	`pending` \| `running` \| `completed` \| `failed`
`trace_id`	Noveum trace ID for this run
`trace_url`	Direct link to the trace in the dashboard
`duration_seconds`	Total call duration
`turn_count`	Number of conversational turns
`persona.name`	Name of the persona used
`scenario.name`	Name of the scenario used

Runs are also visible in Noveum dashboard → Synthetic Testing → Runs.

Step 6: View Traces

Synthetic traces appear in the Traces section alongside real conversations. The schema is identical to a real call — synthetic traces can be filtered by their tags and mixed freely with real traces in datasets.

pipecat.conversation
│  source: synthetic
│  synthetic.persona: "Alex Chen"
│  synthetic.scenario: "Quick single-item order"
│  conversation.turn_count, conversation.total_cost
│  conversation.total_input_tokens, conversation.total_output_tokens
│
├── pipecat.turn × N
│   ├── turn.number, turn.user_input, turn.duration_seconds
│   ├── turn.user_bot_latency_seconds
│   │
│   ├── pipecat.stt   — synthetic caller's speech, audio recording,
│   │                   confidence, vad_to_final_ms, first_text_latency_ms
│   │
│   ├── pipecat.llm   — your agent's reasoning, full input/output,
│   │                   tokens, cost, function calls
│   │
│   └── pipecat.tts   — your agent's spoken response, audio recording,
│                       time_to_first_byte_ms, characters
│
└── pipecat.full_conversation  (when record_audio=True)
    └── stereo WAV: left channel = synthetic caller, right channel = your agent

To filter for synthetic traces in the dashboard: use the source: synthetic filter in the Traces view.

Step 7: Build Datasets and Run NovaEval Evaluations

Creating a dataset

Go to Traces in the Noveum dashboard.
Filter by source: synthetic and/or date range.
Select the traces to include.
Click Create Dataset.

You can mix synthetic and real traces in the same dataset. A dataset that combines both gives the most representative evaluation results.

Running a NovaEval evaluation

Go to Evaluations → New Evaluation.
Select your dataset.
Noveum's NovaEval engine recommends scorers based on your agent type. For voice agents, this includes:
- Conversational metrics: knowledge retention, conversation relevancy, role adherence
- Task completion: goal achievement, tool relevancy, task progression
- Quality: conversation completeness, response clarity
Optionally, select model variants to compare if you are evaluating a model swap.
Start the evaluation — Noveum runs it and presents results in the dashboard.

Results show per-scenario quality scores, aggregate metrics across the full dataset, and side-by-side model comparisons. Edge-case scenarios built with impatient or confused personas reveal exactly where your agent fails before real users do.

Batch Testing

Run every persona × scenario combination for full matrix coverage. After all runs complete, select the traces, create a single dataset, and run one NovaEval evaluation across the entire matrix.

import itertools
import os
 
import requests
 
BASE             = "https://api.noveum.ai/v1"
NOVEUM_API_KEY   = os.environ["NOVEUM_API_KEY"]
HEADERS          = {"Authorization": f"Bearer {NOVEUM_API_KEY}", "Content-Type": "application/json"}
PROJECT          = "my-voice-agent"
 
personas  = requests.get(
    f"{BASE}/synthetic/personas",
    params={"project": PROJECT},
    headers=HEADERS
).json()["personas"]
 
scenarios = requests.get(
    f"{BASE}/synthetic/scenarios",
    params={"project": PROJECT},
    headers=HEADERS
).json()["scenarios"]
 
runs = []
for persona, scenario in itertools.product(personas, scenarios):
    run = requests.post(f"{BASE}/synthetic/runs", json={
        "project":     PROJECT,
        "persona_id":  persona["id"],
        "scenario_id": scenario["id"],
        "agent_endpoint": {
            "type":         "phone",
            "provider":     "plivo",
            "phone_number": "+15550123456"  # replace with your registered phone number
        }
    }, headers=HEADERS).json()
    runs.append(run)
    print(f"  {persona['name']:25s}  ×  {scenario['name']:30s}  →  {run['id']}")
 
print(f"\nStarted {len(runs)} runs  ({len(personas)} personas × {len(scenarios)} scenarios)")

Best Practices

Start with 3–5 personas covering a spread: a patient happy-path user, an impatient expert, a confused first-timer, and one multilingual user if your agent serves mixed-language callers.
Create at least one edge-case scenario per core conversation flow. Happy paths pass by definition — edge cases are where agents break.
Use patience_level: 0.1–0.3 to stress-test. Impatient callers expose slow response times, rambling answers, and goal-completion failures.
Re-run the exact same persona × scenario matrix after every model swap or prompt change. If the new configuration fails more scenarios than the previous one, it is not ready.
Keep record_audio=True in NoveumTraceObserver. Audio recordings of synthetic calls are invaluable for debugging — you can hear how the synthetic caller phrased a request and how your agent responded.
Tag batch-test datasets separately from production datasets. Evaluating a pure synthetic matrix is useful for regression testing; mixing a representative sample of real traces with synthetic ones gives a broader picture of overall agent health.
Use AI-generated personas and scenarios first to get broad coverage fast, then add manual entries to target specific failure modes you discover.

FAQ

Can I run synthetic testing without Noveum Trace integrated into my agent? No. NovaSynth places a real call to your agent, but without NoveumTraceObserver active, the call produces no data. The trace is the entire output of a synthetic run — without it, the call simply disappears.

Do synthetic traces look different from real user traces? In structure, no. They use the same schema as real conversations. They carry source: synthetic, synthetic.persona, and synthetic.scenario attributes so you can filter them in the dashboard and keep them separate from production data when needed.

Can the synthetic caller handle interruptions? Yes. Pipecat's VAD and interruption logic run normally because the synthetic caller delivers real audio. Any pipeline behavior that depends on real-time audio — interruptions, end-of-utterance detection, barge-in — works exactly as it would with a real caller.

What phone providers does NovaSynth support? Plivo and Twilio are supported. For other SIP providers, contact support@noveum.ai.

How long does a test run take? Typical voice agent conversations take 30–120 seconds. NovaSynth runs in real time — it places a real call and the conversation unfolds at the natural pace of speech.

How many runs can I trigger in parallel? Concurrent run limits depend on your plan. Contact support@noveum.ai for details.

Do I need an existing dataset before my first NovaSynth run? No. NovaSynth generates the conversations that become your dataset. Run your first synthetic tests, select the resulting traces in the Noveum dashboard, and create your dataset from there. You do not need to collect real user data before getting started.

How do I get access to NovaSynth synthetic testing? NovaSynth is currently in private beta and is enabled on request. Reach out to support@noveum.ai and we will enable it for your account.