Synthetic Voice Testing
Test your LiveKit voice agent at scale with NovaSynth β realistic AI-generated callers via LiveKit rooms, with automatic tracing, datasets, and NovaEval evaluations
NovaSynth joins your LiveKit room as a participant and conducts a realistic voice conversation with your agent. The synthetic caller is driven by a persona and scenario β your agent handles it as a normal LiveKit session. Your existing trace wrappers (LiveKitSTTWrapper, LiveKitTTSWrapper, setup_livekit_tracing) capture every STT, LLM, and TTS span automatically. The resulting traces are available in your Noveum dashboard for dataset creation and NovaEval evaluations.
Private Beta β NovaSynth synthetic voice testing is currently available to select customers. To enable it for your account, contact support@noveum.ai.
How It Works
Why each layer matters:
- Without the LiveKit trace wrappers: the session runs but produces no data.
- Without datasets: traces exist but cannot be evaluated systematically.
- Without NovaEval: no quality measurement, no regression detection, no model comparison.
- With all three: every test run produces a scored, comparable, auditable result.
Before You Begin
Complete these steps before running synthetic tests.
- Integrate Noveum Trace β follow the LiveKit Integration Overview and the Basic LiveKit Voice Agent guide. Confirm traces are appearing in your Noveum dashboard before proceeding.
- Verify tracing works β run a few test conversations with your agent and confirm the traces appear in your dashboard with the expected STT, LLM, and TTS spans.
- Use NovaSynth to build your initial dataset β run synthetic tests to generate your first batch of traced conversations. Create a dataset from those traces in the Noveum dashboard (Traces β select traces β Create Dataset).
- Run a NovaEval evaluation β go to Evaluations β New Evaluation, select your dataset, and start an evaluation job to establish a quality baseline.
NovaSynth is designed to generate the conversations that make up your evaluation datasets β you do not need an existing dataset before your first run.
Step 1: Expose Your Agent's Audio Endpoint
NovaSynth joins your LiveKit room as a participant and speaks to your agent using AI-generated voice. Your agent processes it as a normal LiveKit job β no changes to your agent code are required.
This snippet shows the imports required for Noveum tracing β your LiveKit server URL, room credentials, and API key are configured in the Noveum dashboard, not in agent code.
Register your LiveKit server URL and API secret under Project Settings β Agent Endpoints in the Noveum dashboard. NovaSynth uses these credentials to generate room tokens and connect to your server.
Step 2: Create Personas
Note: All code and curl examples below use placeholder values β
$NOVEUM_API_KEY,"my-voice-agent","persona_abc123","wss://yourapp.livekit.cloud", and other sample strings. Replace them with your actual API key, project name, IDs, and LiveKit server URL before running.
Personas can be created and managed from the Noveum dashboard under Synthetic Testing β Personas, or via the API examples below.
A persona is the synthetic caller's identity β who they are, how they speak, and what they want from the call. Realistic, diverse personas surface the failure modes that matter most.
Persona fields:
name,descriptionβ character identitygoalβ what they want to accomplish in this sessionpatience_levelβ0.0(immediately frustrated) to1.0(very patient)personality_traitsβ e.g.["direct", "impatient", "tech-savvy"]tone_preferenceβ"casual","formal","friendly","curt","aggressive"primary_languageβ supports multilingual callers, e.g.["Hindi", "English"]knowledge_baseβ what the caller already knows (menu items, pricing, account history)- Optional demographics:
age,occupation,location
AI-generated personas (recommended)
Paste your agent's system prompt and Noveum generates a diverse set covering a range of personalities and goals automatically:
Manual persona creation
Create specific personas to target known problem areas:
A contrasting example β patient first-timer who needs guidance:
List all personas for a project:
Step 3: Create Scenarios
Scenarios can be created and managed from the Noveum dashboard under Synthetic Testing β Scenarios, or via the API examples below.
A scenario is the conversation plan. It defines what the caller wants to accomplish, in what order, and how the conversation branches based on agent responses.
Scenario structure:
name,descriptionβ what this scenario testseventsβ a tree of conversation steps:idβ unique step identifierparent_idβ which step this follows (nullfor the opening step)actionβ what the synthetic caller does at this stepconditionβ optional: this step only fires if this condition is met in the conversationfixed: trueβ this step always happens regardless of what the agent says
Steps with fixed: true form the backbone of the conversation. Steps with a condition create branches β the synthetic caller responds adaptively, the same way a real person would.
AI-generated scenarios (recommended)
Manual scenario β happy path
Manual scenario β edge case (unavailable item)
List all scenarios for a project:
Step 4: Trigger a Synthetic Test Run
With a persona, a scenario, and a registered LiveKit endpoint, trigger a run. Noveum's infrastructure handles everything from here β joining the room, driving the conversation, and capturing the trace. Runs can also be started from the Noveum dashboard β Synthetic Testing β New Run.
What happens after the request:
- NovaSynth connects to your LiveKit server and joins the specified room as a participant.
- The synthetic caller speaks naturally using the persona's voice, tone, and language, following the scenario's event tree.
- Your LiveKit agent handles the session as a normal job β STT, LLM, and TTS run as normal.
LiveKitSTTWrapper,LiveKitTTSWrapper, andsetup_livekit_tracingcapture every span with full audio, transcripts, latency, and token data.- When the scenario completes or a natural goodbye occurs, the session ends.
- The complete trace appears in your Noveum dashboard, tagged
source: synthetic.
Step 5: Monitor a Run
Response fields:
| Field | Description |
|---|---|
id | Run identifier |
status | pending | running | completed | failed |
trace_id | Noveum trace ID for this run |
trace_url | Direct link to the trace in the dashboard |
duration_seconds | Total session duration |
turn_count | Number of conversational turns |
persona.name | Name of the persona used |
scenario.name | Name of the scenario used |
Runs are also visible in Noveum dashboard β Synthetic Testing β Runs.
Step 6: View Traces
Synthetic traces appear in the Traces section alongside real conversations. The schema is identical to a real session β synthetic traces can be filtered by their tags and mixed freely with real traces in datasets.
To filter for synthetic traces in the dashboard: use the source: synthetic filter in the Traces view.
Step 7: Build Datasets and Run NovaEval Evaluations
Creating a dataset
- Go to Traces in the Noveum dashboard.
- Filter by
source: syntheticand/or date range. - Select the traces to include.
- Click Create Dataset.
You can mix synthetic and real traces in the same dataset. A dataset that combines both gives the most representative evaluation results.
Running a NovaEval evaluation
- Go to Evaluations β New Evaluation.
- Select your dataset.
- Noveum's NovaEval engine recommends scorers based on your agent type. For voice agents, this includes:
- Conversational metrics: knowledge retention, conversation relevancy, role adherence
- Task completion: goal achievement, tool relevancy, task progression
- Quality: conversation completeness, response clarity
- Optionally, select model variants to compare if you are evaluating a model swap.
- Start the evaluation β Noveum runs it and presents results in the dashboard.
Results show per-scenario quality scores, aggregate metrics across the full dataset, and side-by-side model comparisons. Edge-case scenarios built with impatient or confused personas reveal exactly where your agent fails before real users do.
Batch Testing
Run every persona Γ scenario combination for full matrix coverage. After all runs complete, select the traces, create a single dataset, and run one NovaEval evaluation across the entire matrix.
Best Practices
- Start with 3β5 personas covering a spread: a patient happy-path user, an impatient expert, a confused first-timer, and one multilingual user if your agent serves mixed-language callers.
- Create at least one edge-case scenario per core conversation flow. Happy paths pass by definition β edge cases are where agents break.
- Use
patience_level: 0.1β0.3to stress-test. Impatient callers expose slow response times, rambling answers, and goal-completion failures. - Re-run the exact same persona Γ scenario matrix after every model swap or prompt change. If the new configuration fails more scenarios than the previous one, it is not ready.
- Keep
record=Trueinsetup_livekit_tracing. Audio recordings of synthetic sessions are invaluable for debugging β you can hear how the synthetic caller phrased a request and how your agent responded. - Tag batch-test datasets separately from production datasets. Evaluating a pure synthetic matrix is useful for regression testing; mixing a representative sample of real traces with synthetic ones gives a broader picture of overall agent health.
- Use AI-generated personas and scenarios first to get broad coverage fast, then add manual entries to target specific failure modes you discover.
FAQ
Can I run synthetic testing without Noveum Trace integrated into my agent?
No. NovaSynth joins your LiveKit room and conducts a real session, but without LiveKitSTTWrapper, LiveKitTTSWrapper, and setup_livekit_tracing active, the session produces no data. The trace is the entire output of a synthetic run β without it, the session simply disappears.
Do synthetic traces look different from real user traces?
In structure, no. They use the same schema as real sessions. They carry source: synthetic, synthetic.persona, and synthetic.scenario attributes so you can filter them in the dashboard and keep them separate from production data when needed.
Can the synthetic caller handle interruptions? Yes. The synthetic caller delivers real audio into the LiveKit room, so any session behavior that depends on real-time audio β VAD, barge-in, end-of-utterance detection β works exactly as it would with a real user.
How long does a test run take? Typical voice agent conversations take 30β120 seconds. NovaSynth runs in real time β it joins a real LiveKit session and the conversation unfolds at the natural pace of speech.
How many runs can I trigger in parallel? Concurrent run limits depend on your plan. Contact support@noveum.ai for details.
Do I need an existing dataset before my first NovaSynth run? No. NovaSynth generates the conversations that become your dataset. Run your first synthetic tests, select the resulting traces in the Noveum dashboard, and create your dataset from there. You do not need to collect real user data before getting started.
How do I get access to NovaSynth synthetic testing? NovaSynth is currently in private beta and is enabled on request. Reach out to support@noveum.ai and we will enable it for your account.
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.