Datasets in Noveum

Noveum allows you to generate and manage datasets directly from your real-world application logs captured by the Noveum Trace SDK. This helps you create highly relevant benchmarks for evaluating new models.

1. Creating a Dataset

Enable Logging in the Noveum Trace SDK. Once logs are flowing into Elasticsearch or the Noveum hosted solution, you can view them in the Noveum dashboard.
Select Logs: In the Noveum dashboard, choose a subset of logs that represent typical interactions from your users.
Label: Flag “success” or “error” to curate high-quality data. You can also add domain-specific tags like finance, customer-support, medical, etc.
Save the curated selection as a new dataset.

That’s it! You now have an initial Eval Dataset ready for repeated testing against multiple models.

2. Dataset Best Practices

Diversity: Include a wide range of queries to capture all relevant edge cases (especially if your app handles multiple languages or domains).
Update Regularly: Keep collecting logs so your dataset grows over time. AI performance can shift as user behavior changes.
Keep Sensitive Data Out: If needed, configure PII (personally identifiable information) redaction in the Noveum Trace SDK or in your pipeline.

3. Importing External Benchmark Datasets

Noveum is also compatible with commonly used public benchmarks, such as MMLU (Massive Multitask Language Understanding) or toxicity detection sets. To import:

Upload the benchmark dataset in CSV or JSON format through the Datasets section of the Noveum dashboard.
Map the columns/fields (e.g., prompt, expected_answer).
Save and let Noveum unify it with your existing logs.

Now you have a hybrid dataset that combines your real-world usage plus standard benchmarks for more robust evaluations.

4. Versioning & History

Every dataset in Noveum is versioned. Each time you add or remove logs or tags, a new version is saved:

v1.0: initial import
v1.1: added error cases, removed duplicates
v2.0: integrated new domain data

This versioning ensures reproducible experiments—if you re-run an Eval Job on v1.0 vs. v2.0, you’ll know exactly which data was tested.

Next Steps

Datasets in Noveum

1. Creating a Dataset

2. Dataset Best Practices

3. Importing External Benchmark Datasets

4. Versioning & History

Get Early Access to Noveum.ai Platform

On this page