Datasets in Noveum
Noveum allows you to generate and manage datasets directly from your real-world application logs captured by the AI Gateway. This helps you create highly relevant benchmarks for evaluating new models.
1. Creating a Dataset
- Enable Logging in the AI Gateway. Once logs are flowing into Elasticsearch or the Noveum hosted solution, you can view them in the Noveum dashboard.
- Select Logs: In the Noveum dashboard, choose a subset of logs that represent typical interactions from your users.
- Label: Flag “success” or “error” to curate high-quality data. You can also add domain-specific tags like
finance
,customer-support
,medical
, etc. - Save the curated selection as a new dataset.
That’s it! You now have an initial Eval Dataset ready for repeated testing against multiple models.
2. Dataset Best Practices
- Diversity: Include a wide range of queries to capture all relevant edge cases (especially if your app handles multiple languages or domains).
- Update Regularly: Keep collecting logs so your dataset grows over time. AI performance can shift as user behavior changes.
- Keep Sensitive Data Out: If needed, configure PII (personally identifiable information) redaction in the Gateway or in your pipeline.
3. Importing External Benchmark Datasets
Noveum is also compatible with commonly used public benchmarks, such as MMLU (Massive Multitask Language Understanding) or toxicity detection sets. To import:
- Upload the benchmark dataset in CSV or JSON format through the
Datasets
section of the Noveum dashboard. - Map the columns/fields (e.g., prompt, expected_answer).
- Save and let Noveum unify it with your existing logs.
Now you have a hybrid dataset that combines your real-world usage plus standard benchmarks for more robust evaluations.
4. Versioning & History
Every dataset in Noveum is versioned. Each time you add or remove logs or tags, a new version is saved:
- v1.0: initial import
- v1.1: added error cases, removed duplicates
- v2.0: integrated new domain data
This versioning ensures reproducible experiments—if you re-run an Eval Job on v1.0 vs. v2.0, you’ll know exactly which data was tested.
Next Steps
Get Early Access to Noveum.ai Platform
Be the first one to get notified when we open Noveum Platform to more users. All users get access to Observability suite for free, early users get free eval jobs and premium support for the first year.