ROUGEScorer
Computes ROUGE scores for text summarization evaluation. Calculates ROUGE-1 (unigram), ROUGE-2 (bigram), and ROUGE-L (longest common subsequence) metrics for comprehensive content overlap analysis.
Overview
Computes ROUGE scores for text summarization evaluation. Calculates ROUGE-1 (unigram), ROUGE-2 (bigram), and ROUGE-L (longest common subsequence) metrics for comprehensive content overlap analysis.
Use Cases
- Accuracy benchmarking and validation
How It Works
This scorer uses deterministic rule-based evaluation to validate outputs against specific criteria. It applies predefined rules and patterns to assess the response, providing consistent and reproducible results without requiring LLM inference.
Input Schema
| Parameter | Type | Required | Description |
|---|---|---|---|
| prediction | str | Yes | Generated summary or text |
| ground_truth | str | Yes | Reference summary |
Output Schema
| Field | Type | Description |
|---|---|---|
| score | float | Combined ROUGE score (0-10) |
| passed | bool | True if above threshold |
| reasoning | str | Score explanation |
| metadata.rouge1 | float | ROUGE-1 unigram score |
| metadata.rouge2 | float | ROUGE-2 bigram score |
| metadata.rougeL | float | ROUGE-L score |
Score Interpretation
Default threshold: 7/10
Related Scorers
Frequently Asked Questions
When should I use this scorer?
Use ROUGEScorer when you need to evaluate nlp-metrics and accuracy aspects of your AI outputs. It's particularly useful for accuracy benchmarking and validation.
Why does this scorer need expected output?
This scorer compares the generated output against a known expected result to calculate accuracy metrics.
Can I customize the threshold?
Yes, the default threshold of 7 can be customized when configuring the scorer.
Quick Info
Ready to try ROUGEScorer?
Start evaluating your AI agents with Noveum.ai's comprehensive scorer library.
Explore More Scorers
Discover 68+ LLM-as-Judge scorers for comprehensive AI evaluation