AggregateRAGScorer
Combines multiple retrieval scorers with weighted averaging for comprehensive RAG evaluation. Allows configurable weights for different metrics to compute a single aggregate score.
Overview
Combines multiple retrieval scorers with weighted averaging for comprehensive RAG evaluation. Allows configurable weights for different metrics to compute a single aggregate score.
Use Cases
- RAG-based question answering systems
How It Works
This scorer uses deterministic rule-based evaluation to validate outputs against specific criteria. It applies predefined rules and patterns to assess the response, providing consistent and reproducible results without requiring LLM inference.
Input Schema
| Parameter | Type | Required | Description |
|---|---|---|---|
| scorers | dict | Yes | Dictionary of scorer instances |
| weights | dict | Yes | Weights per scorer (must sum to 1.0) |
| prediction | str | Yes | Generated answer |
| ground_truth | str | No | Expected answer |
| context | dict | list | Yes | Retrieved context |
Output Schema
| Field | Type | Description |
|---|---|---|
| aggregate | float | Weighted aggregate score (0-10) |
| individual_scores | dict | Score per scorer |
Score Interpretation
Default threshold: 7/10
Related Scorers
Frequently Asked Questions
When should I use this scorer?
Use AggregateRAGScorer when you need to evaluate rag and rule-based aspects of your AI outputs. It's particularly useful for rag-based question answering systems.
Why doesn't this scorer need expected output?
This scorer evaluates quality aspects that don't require comparison against a reference answer. It uses the system prompt and context as the implicit ground truth.
Can I customize the threshold?
Yes, the default threshold of 7 can be customized when configuring the scorer.
Quick Info
Ready to try AggregateRAGScorer?
Start evaluating your AI agents with Noveum.ai's comprehensive scorer library.
Explore More Scorers
Discover 68+ LLM-as-Judge scorers for comprehensive AI evaluation