Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use accuracy as default metric for text classification Evaluator #128

Merged
merged 2 commits into from
Jun 8, 2022

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Jun 8, 2022

This PR replaces the current default metric in the text classification Evaluator from f1 to accuracy.

The reason for doing so is that the Evaluator fails when the dataset is multiclass because the default average in f1 is binary. Although there are well known limitations with using accuracy as the only metric, I think it's better to have defaults that "just work"

Remark: it probably would make sense to expand the unit tests to test the evaluator on binary / multiclass / multilabel datasets. I'm happy to implement that if you think that's useful.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 8, 2022

The documentation is not available anymore as the PR was closed or merged.

from datasets import Dataset, load_dataset, load_metric
from transformers import AutoTokenizer, BertForSequenceClassification, pipeline
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of cleaning this up a bit to follow the conventions in transformers (use autoclasses over model-specific ones where possible)



class TestEvaluator(TestCase):
def setUp(self):
self.data = Dataset.from_dict(load_dataset("imdb")["test"][:2])
self.data = load_dataset("imdb", split="test[:2]")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of using something more datasets-like here

@ola13 ola13 requested review from lvwerra and ola13 June 8, 2022 09:31
Copy link
Contributor

@ola13 ola13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Looks good to me, I'd wait for @lvwerra to weigh in as he was suggesting F1 as the default metric here

@lvwerra
Copy link
Member

lvwerra commented Jun 8, 2022

LGTM! 🚀

@lvwerra lvwerra merged commit 3959ec8 into main Jun 8, 2022
@lewtun lewtun deleted the lewtun-patch-1 branch June 8, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants