ContextFaithfulnessScorerPP
Enhanced faithfulness detection with fine-grained claim-by-claim analysis. Performs two-stage evaluation: first extracting individual factual claims from the answer, then verifying each claim against context. Particularly effective for detecting partial hallucinations.
Overview
Enhanced faithfulness detection with fine-grained claim-by-claim analysis. Performs two-stage evaluation: first extracting individual factual claims from the answer, then verifying each claim against context. Particularly effective for detecting partial hallucinations.
Use Cases
- RAG-based question answering systems
- Hallucination detection in generated content
How It Works
This scorer uses LLM-as-Judge technology to evaluate responses. It prompts a large language model with specific evaluation criteria and the content to assess, then analyzes the LLM's judgment to produce a score and detailed reasoning.
Input Schema
| Parameter | Type | Required | Description |
|---|---|---|---|
| output_text | str | Yes | Answer containing claims to verify |
| input_text | str | No | Original question |
| context | list[str] | str | Yes | Context for claim verification |
Output Schema
| Field | Type | Description |
|---|---|---|
| score | float | Faithfulness score (0-10) |
| passed | bool | True if faithful |
| reasoning | str | Verification analysis |
| metadata.verified_claims | int | Number of verified claims |
| metadata.total_claims | int | Total claims extracted |
Score Interpretation
Default threshold: 7/10
Related Scorers
Frequently Asked Questions
When should I use this scorer?
Use ContextFaithfulnessScorerPP when you need to evaluate multi-context and rag aspects of your AI outputs. It's particularly useful for rag-based question answering systems.
Why doesn't this scorer need expected output?
This scorer evaluates quality aspects that don't require comparison against a reference answer. It uses the system prompt and context as the implicit ground truth.
Can I customize the threshold?
Yes, the default threshold of 7 can be customized when configuring the scorer.
Quick Info
Ready to try ContextFaithfulnessScorerPP?
Start evaluating your AI agents with Noveum.ai's comprehensive scorer library.
Explore More Scorers
Discover 68+ LLM-as-Judge scorers for comprehensive AI evaluation