Add NIST metric #250

BramVanroy · 2022-08-11T15:03:39Z

NIST is a somewhat older but well known metric for MT that is similar to BLEU. I'd like to add it to the base arsenal of evaluate.

Core work is done. Still need to write README, examples, and test cases.

HuggingFaceDocBuilderDev · 2022-08-11T15:07:08Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

This looks great already! My main comment is around input format: I'd prefer to input untokenized strings, similar to BLEU. We can either use the same tokenizer as is used there or a whitespace tokenizer. The motivation is that this way we use the same format for both metrics (as well as many other NLP metrics) which will allow us to easily combine them. What do you think?

metrics/nist/nist.py

metrics/nist/requirements.txt

metrics/nist/nist.py

metrics/nist/app.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

BramVanroy · 2022-08-15T15:18:19Z

This looks great already! My main comment is around input format: I'd prefer to input untokenized strings, similar to BLEU. We can either use the same tokenizer as is used there or a whitespace tokenizer. The motivation is that this way we use the same format for both metrics (as well as many other NLP metrics) which will allow us to easily combine them. What do you think?

Thanks for having a look! I rather stick to the original paper as close as possible and not start mix-matching tokenizers. There are probably only minor differences, but relying on a standard implementation for each metric makes things easier to keep track of imo. But you are right that having the same input format would be useful! What about using the nltk NIST tokenizer? The paper mentions (p. 138) that all text is lower-cased before calculating the metric, so lowercase should be False.

In the future, I think it'd be nice to support token-level input though, in addition to sentence-level. As is clear from the example here, these tokenizers here have their own options but in init and in calling their tokenization methods (e.g. western_lang). To make it very adaptable by users, you'd have to allow for custom tokenizer init, custom tokenizer call args, etc. That is a lot of effort and very different between metrics. If we can just have token-level as input, then we can leave the preprocessing to the users in case they want to make use of advanced options.

lvwerra · 2022-08-15T16:07:09Z

What about using the nltk NIST tokenizer?

Sounds good. Like in BLEU we can use it as the default but let the user experiment with other tokenizers should it be appropriate (e.g. for langs that require special tokenization).

BramVanroy · 2022-08-31T17:10:26Z

Hey @lvwerra. I had some time to work on this. I can't seem to figure out why the tests are given errors though. My input is a string for predictions and a list for the references. This is also visible in my features. But pytest still throws errors. Can you see what I'm missing here?

metrics/nist_mt/nist_mt.py

BramVanroy · 2022-09-07T14:52:00Z

@lvwerra Tests run fine locally for NIST specifically, so I guess I have incorporated the test suite incorrectly. Isn't having a tests.py file in the metric directory the correct way to do it? That's the file that is automatically generated with template I believe.

lvwerra · 2022-09-21T07:04:15Z

Hi @BramVanroy, thanks for working on this and sorry for getting back so late. The tests.py at this point are just placeholders and the main tests for the metrics are the doctests: they run the examples in the docstring of the module and check if the results match.

I think the issue is still that **kwargs are not allowed in the _compute signature so I would try again replacing it with an explicit keyword. Let me know if you need any help!

BramVanroy · 2022-12-06T10:03:14Z

Hi @lvwerra, sorry for taking so long to get back to this. I've swapped out the tokenizers_kwargs with explicit kwargs. Hope that helps.

BramVanroy · 2022-12-06T10:38:57Z

The Windows errors seem unrelated to this PR.

lvwerra

Also only minor comments :)

lvwerra · 2022-12-06T11:28:00Z

metrics/nist_mt/nist_mt.py

+
+
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class Nist_mt(evaluate.Metric):


I am a bit uneasy with that class name :) Can we make it

Suggested change

class Nist_mt(evaluate.Metric):

class NistMt(evaluate.Metric):

or something similar? Class names are not supposed to have underscores.

Sure! I don't like it either, not sure how that slipped in. My bad. I think you mentioned before that we should not just call it NIST because there are other NIST metrics out there, so I guess NistMt is the best option?

lvwerra · 2022-12-06T11:29:00Z

metrics/nist_mt/nist_mt.py

+
+    def _compute(self, predictions, references, n: int = 5, lowercase=False, western_lang=True):
+        tokenizer = NISTTokenizer()
+        if isinstance(predictions, str) and isinstance(references[0], str):  # sentence nist_mt


I think predictions are always a list so this should be

Suggested change

if isinstance(predictions, str) and isinstance(references[0], str): # sentence nist_mt

if isinstance(predictions[0], str) and isinstance(references[0], str): # sentence nist_mt

no?

No, I do not think so because for NIST you can have multiple references, so references also has one dimension more than predictions.

I wanted to account for both single sentences and batches of sentences. But I think you are right: _compute always works on batches, right? So I guess I can just remove these if-clauses. And I think I should then also be able to remove the first feature in self.features. Those features only indicate the type of a batch, right? Not single instances?

I think the if/else is fine, but it should be predictions[0] in both cases whereas for refs it is references[0] and ``references[0][0]` for one or more references.

self.features show the type of one element of the batch.

I've updated the metric, assuming that a given sample is always prediction (str) and reference (Sequence[str]). That should take away some confusion, I believe.

lvwerra · 2022-12-06T11:32:12Z

setup.py

@@ -105,6 +105,7 @@
 TESTS_REQUIRE = [
    # test dependencies
    "absl-py",
+    "nltk",  # for NIST and probably others


something must have already installed it since the tests worked until now, but ok to have it explicitely :)

init for adding NIST

555253f

lvwerra reviewed Aug 15, 2022

View reviewed changes

Update metrics/nist/requirements.txt

cc790e2

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Bram Vanroy added 3 commits August 31, 2022 19:08

allow for strings as input with NISTTokenizer

e5d8c8e

add NIST test

d35e53b

remove unittest dependecy from nist test

000df50

Bram Vanroy added 2 commits August 31, 2022 19:10

make style

520bd15

rename to nist_mt

4367058

BramVanroy marked this pull request as ready for review September 2, 2022 07:58

lvwerra reviewed Sep 2, 2022

View reviewed changes

metrics/nist_mt/nist_mt.py Outdated Show resolved Hide resolved

Bram Vanroy added 6 commits September 2, 2022 10:06

update README

6ec724c

only try downloading if we can't find perluniprops

54bd634

Update README.md

eefa4c2

fix example and tests

e9c29ba

make style

7496401

fix trailing whitespace

4f8ce3b

add nltk dependency for tests

3f848ae

Bram Vanroy added 2 commits December 6, 2022 10:49

Merge branch 'huggingface:main' into nist

3b7bce4

use explicit kwargs

7019f98

lvwerra reviewed Dec 6, 2022

View reviewed changes

Bram Vanroy added 2 commits December 6, 2022 12:48

make sure that we are working on batches

a0972b6

add option for single references

63133fd

satisfy quality

deaa386

lvwerra approved these changes Dec 6, 2022

View reviewed changes

lvwerra merged commit 2253a6e into huggingface:main Dec 6, 2022

BramVanroy deleted the nist branch December 6, 2022 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NIST metric #250

Add NIST metric #250

BramVanroy commented Aug 11, 2022

HuggingFaceDocBuilderDev commented Aug 11, 2022 •

edited

Loading

lvwerra left a comment

BramVanroy commented Aug 15, 2022

lvwerra commented Aug 15, 2022

BramVanroy commented Aug 31, 2022

BramVanroy commented Sep 7, 2022

lvwerra commented Sep 21, 2022

BramVanroy commented Dec 6, 2022

BramVanroy commented Dec 6, 2022

lvwerra left a comment

lvwerra Dec 6, 2022

BramVanroy Dec 6, 2022

lvwerra Dec 6, 2022

BramVanroy Dec 6, 2022

lvwerra Dec 6, 2022

BramVanroy Dec 6, 2022

lvwerra Dec 6, 2022



		@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
		class Nist_mt(evaluate.Metric):

	class Nist_mt(evaluate.Metric):
	class NistMt(evaluate.Metric):

	if isinstance(predictions, str) and isinstance(references[0], str): # sentence nist_mt
	if isinstance(predictions[0], str) and isinstance(references[0], str): # sentence nist_mt

Add NIST metric #250

Add NIST metric #250

Conversation

BramVanroy commented Aug 11, 2022

HuggingFaceDocBuilderDev commented Aug 11, 2022 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

BramVanroy commented Aug 15, 2022

lvwerra commented Aug 15, 2022

BramVanroy commented Aug 31, 2022

BramVanroy commented Sep 7, 2022

lvwerra commented Sep 21, 2022

BramVanroy commented Dec 6, 2022

BramVanroy commented Dec 6, 2022

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Dec 6, 2022

Choose a reason for hiding this comment

BramVanroy Dec 6, 2022

Choose a reason for hiding this comment

lvwerra Dec 6, 2022

Choose a reason for hiding this comment

BramVanroy Dec 6, 2022

Choose a reason for hiding this comment

lvwerra Dec 6, 2022

Choose a reason for hiding this comment

BramVanroy Dec 6, 2022

Choose a reason for hiding this comment

lvwerra Dec 6, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 11, 2022 •

edited

Loading