Accuracy ScorerRule-Based

F1Scorer

Computes token-level F1 score between prediction and ground truth. Balances precision and recall, useful for extractive tasks and named entity recognition.

Back to Scorers View Documentation

Overview

Computes token-level F1 score between prediction and ground truth. Balances precision and recall, useful for extractive tasks and named entity recognition.

accuracynlp-metricsrule-basedbenchmarkf1precisionrecall

Use Cases

Accuracy benchmarking and validation

How It Works

This scorer uses deterministic rule-based evaluation to validate outputs against specific criteria. It applies predefined rules and patterns to assess the response, providing consistent and reproducible results without requiring LLM inference.

Input Schema

Parameter	Type	Required	Description
prediction	str	Yes	Generated output
ground_truth	str	Yes	Expected output

Output Schema

Field	Type	Description
score	float	F1 score scaled to 0-10
passed	bool	True if above threshold
reasoning	str	F1 analysis
metadata.precision	float	Precision value
metadata.recall	float	Recall value
metadata.f1	float	Raw F1 score

Score Interpretation

Default threshold: 7/10

10Perfect MatchOutput exactly matches expected format/value

0No MatchOutput does not match expected format/value

Related Scorers

Accuracy

ExactMatchScorer

Evaluates whether the prediction exactly matches the ground truth. Strictest form of accuracy measur...

NLP Metrics

ROUGEScorer

Computes ROUGE scores for text summarization evaluation. Calculates ROUGE-1 (unigram), ROUGE-2 (bigr...

Accuracy

AccuracyScorer

Calculates accuracy based on substring or token-level matching between prediction and ground truth. ...

Accuracy

MultiPatternAccuracyScorer

Evaluates prediction accuracy against multiple acceptable patterns or answers. Ideal for tasks where...

Frequently Asked Questions

When should I use this scorer?

Use F1Scorer when you need to evaluate accuracy and nlp-metrics aspects of your AI outputs. It's particularly useful for accuracy benchmarking and validation.

Why does this scorer need expected output?

This scorer compares the generated output against a known expected result to calculate accuracy metrics.

Can I customize the threshold?

Yes, the default threshold of 7 can be customized when configuring the scorer.

Quick Info

CategoryAccuracy

Evaluation TypeRule-Based

Requires Expected OutputYes

Default Threshold7/10

Ready to try F1Scorer?

Start evaluating your AI agents with Noveum.ai's comprehensive scorer library.

Start Free Trial View Documentation

Explore More Scorers

Discover 106 calibrated LLM-as-Judge scorers for comprehensive AI evaluation

View All Scorers Contact Sales