Use accuracy as default metric for text classification Evaluator #128

lewtun · 2022-06-08T07:26:47Z

This PR replaces the current default metric in the text classification Evaluator from f1 to accuracy.

The reason for doing so is that the Evaluator fails when the dataset is multiclass because the default average in f1 is binary. Although there are well known limitations with using accuracy as the only metric, I think it's better to have defaults that "just work"

Remark: it probably would make sense to expand the unit tests to test the evaluator on binary / multiclass / multilabel datasets. I'm happy to implement that if you think that's useful.

HuggingFaceDocBuilderDev · 2022-06-08T07:29:45Z

The documentation is not available anymore as the PR was closed or merged.

lewtun · 2022-06-08T07:41:24Z

tests/test_evaluator.py

-from datasets import Dataset, load_dataset, load_metric
-from transformers import AutoTokenizer, BertForSequenceClassification, pipeline
+from datasets import load_dataset
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline


I took the liberty of cleaning this up a bit to follow the conventions in transformers (use autoclasses over model-specific ones where possible)

lewtun · 2022-06-08T07:41:51Z

tests/test_evaluator.py



 class TestEvaluator(TestCase):
    def setUp(self):
-        self.data = Dataset.from_dict(load_dataset("imdb")["test"][:2])
+        self.data = load_dataset("imdb", split="test[:2]")


I took the liberty of using something more datasets-like here

ola13

Makes sense! Looks good to me, I'd wait for @lvwerra to weigh in as he was suggesting F1 as the default metric here

lvwerra · 2022-06-08T09:36:35Z

LGTM! 🚀

Use accuracy as default metric for text classification Evaluator

050e5bc

Fix tests

9846ebb

lewtun commented Jun 8, 2022

View reviewed changes

ola13 requested review from lvwerra and ola13 June 8, 2022 09:31

ola13 approved these changes Jun 8, 2022

View reviewed changes

lvwerra approved these changes Jun 8, 2022

View reviewed changes

lvwerra merged commit 3959ec8 into main Jun 8, 2022

lewtun deleted the lewtun-patch-1 branch June 8, 2022 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use accuracy as default metric for text classification Evaluator #128

Use accuracy as default metric for text classification Evaluator #128

lewtun commented Jun 8, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 8, 2022 •

edited

Loading

lewtun Jun 8, 2022

lewtun Jun 8, 2022

ola13 left a comment

lvwerra commented Jun 8, 2022

Use accuracy as default metric for text classification Evaluator #128

Use accuracy as default metric for text classification Evaluator #128

Conversation

lewtun commented Jun 8, 2022 • edited Loading

HuggingFaceDocBuilderDev commented Jun 8, 2022 • edited Loading

lewtun Jun 8, 2022

Choose a reason for hiding this comment

lewtun Jun 8, 2022

Choose a reason for hiding this comment

ola13 left a comment

Choose a reason for hiding this comment

lvwerra commented Jun 8, 2022

lewtun commented Jun 8, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 8, 2022 •

edited

Loading