LLM-as-Judge Scorers

68+ LLM-as-Judge Scorers for Comprehensive AI Evaluation

Evaluate every dimension of your AI agents with Noveum.ai's comprehensive scorer library. From hallucination detection to bias assessment, we've got all the evaluation metrics you need.

View Documentation

Why use Noveum.ai Scorers/Evals?

Built for production AI evaluation with everything you need out of the box

No Manual Labeling Required

Evaluate agents automatically using system prompts as ground truth. No need to create expected outputs for every test case.

LLM-as-Judge Technology

Powered by advanced LLM evaluation for nuanced quality assessment that understands context and intent.

Comprehensive Coverage

68+ scorers covering every dimension of AI quality from hallucination detection to bias assessment.

Enterprise-Ready

Used by leading enterprises for production agent evaluation with battle-tested reliability and scale.

Fully Customizable

Create custom scorers for your specific business needs. Extend existing scorers or build from scratch.

Trace-Based Evaluation

Evaluate complete agent workflows from traces. Analyze tool calls, reasoning steps, and multi-turn conversations.

Explore All Scorers

Search and filter 68+ scorers across 13 categories

Showing 68 of 68 scorers

Do You Have the Scorers You Need?

Find the right scorers for your specific use case

RAG System Evaluation

  • AnswerRelevancyScorer
  • FaithfulnessScorer
  • ContextualPrecisionScorer
  • ContextualRecallScorer
  • RAGASScorer

Safety & Compliance

  • ToxicityScorer
  • ContentSafetyViolationScorer
  • IsHarmfulAdviceScorer
  • ContentModerationScorer
  • AnswerRefusalScorer

Bias Detection

  • NoGenderBiasScorer
  • NoRacialBiasScorer
  • NoAgeBiasScorer
  • CulturalSensitivityScorer
  • BiasDetectionScorer

Frequently Asked Questions

Everything you need to know about Noveum.ai scorers

Evaluation scorers are specialized metrics that assess different aspects of AI agent behavior. They use LLM-as-Judge technology to evaluate quality dimensions like faithfulness, relevancy, safety, and more—without requiring manual labeling or expected outputs.

Ready to evaluate your AI agents?

Start using 68+ scorers for free. No credit card required.