fix `add` method #424

hazrulakmal · 2023-02-18T01:40:20Z

Fixes #355

Rouge's README.md typo correction
changes in self.info.features

This is my first attempt to debug a public codebase. def need a few pointers for guidance.

I've investigated two possible sources of error methods as per suggested by you @lvwerra, namely the _infer_feature_from_example or _enforce_nested_string_type.

After some debugging, I found that the former method may work as intended. It correctly picks the right schema format for a given prediction and reference input. For example, when I ran the code below,

rouge = evaluate.load("rouge")
rouge.add(predictions="sentence 1", references="sentence 1")
print(rouge.compute())

there are two schemas available for rouge (as it supports multiple reference inputs for a single prediction) which are saved as a list like follows [{'predictions': Value(dtype='string', id='sequence'), 'references': Sequence(feature=Value(dtype='string', id='sequence'), length=-1, id=None)}, {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}]. the method will then update the right format to self.selected_feature_format.

I suspect the problem arises when the code tries to call _enforce_nested_string_type on self.info.features which stores both schemas rather than the one that was chosen by _infer_feature_from_example. this discrepancy then leads to _enforce_nested_string_type enforcing on the wrong schema which in this case is the reference sequence schema when it was supposed to enforce it on the reference value schema.

So I changed self.info.features to self.selected_feature_format and it works. I tried on a few other tests that I created in debug_test.py and they all passed - this probably isn't the right way of doing code tests but I'm new so yeah 😅.

HuggingFaceDocBuilderDev · 2023-02-18T01:45:09Z

The documentation is not available anymore as the PR was closed or merged.

hazrulakmal · 2023-02-27T13:11:11Z

I've tested the code using $ python -m pytest ./tests/ and it passes all existing tests except for

================================================================================= short test summary info ==================================================================================
FAILED tests/test_evaluator.py::TestTextClassificationEvaluator::test_bootstrap - AssertionError: 0.3355737770310757 != 0.33333 within 5 places (0.0022437770310757 difference)
FAILED tests/test_evaluator.py::TestTextClassificationEvaluator::test_bootstrap_and_perf - AssertionError: 0.3355737770310757 != 0.333333 within 5 places (0.0022407770310757247 difference)
=========================================================== 2 failed, 149 passed, 82 skipped, 9 warnings in 396.56s (0:06:36) ==============================================================

which I suspect failure was due to the bootstrap randomness rather than the underlying code. The code without the fix also failed to pass these two tests.

On top of that, I also run a few other tests myself like below

rouge = evaluate.load("accuracy")
rouge.add(predictions=1, references=1)
print("Accuracy metrics")
print(rouge.compute())

rouge = evaluate.load("rouge")
print(rouge.features)
predictions = ["hello there", "general kenobi"]
references = [["hello", "there"], ["general kenobi", "general yoda"]]
for sample_pred, sample_ref in zip(predictions, references):
    rouge.add(predictions=sample_pred, references=sample_ref)
print("ROUGE metrics")
print(rouge.compute())

the output I receive is exactly as what we would expect it to see as shown in the rouge doc

Also, when I run

 rouge = evaluate.load('rouge')
 rouge.add(predictions="sentence 1", references="sentence 2")

like the person who reported this bug, I received an output {'rouge1': 0.5, 'rouge2': 0.0, 'rougeL': 0.5, 'rougeLsum': 0.5}

lvwerra

Awesome, thanks for fixing!

* fix self.info.features * remove debug_test.py

fix self.info.features

aa1e729

hazrulakmal mentioned this pull request Feb 18, 2023

Bug fix on add method and Rouge #410

Closed

remove debug_test.py

10394d6

hazrulakmal marked this pull request as ready for review February 27, 2023 13:12

lvwerra approved these changes Mar 14, 2023

View reviewed changes

lvwerra merged commit 926d1fe into huggingface:main Mar 14, 2023

unna97 pushed a commit to unna97/evaluate that referenced this pull request Mar 20, 2023

fix add method (huggingface#424)

0553432

* fix self.info.features * remove debug_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix `add` method #424

fix `add` method #424

hazrulakmal commented Feb 18, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 18, 2023 •

edited

Loading

hazrulakmal commented Feb 27, 2023 •

edited

Loading

lvwerra left a comment

fix add method #424

fix add method #424

Conversation

hazrulakmal commented Feb 18, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Feb 18, 2023 • edited Loading

hazrulakmal commented Feb 27, 2023 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

fix `add` method #424

fix `add` method #424

hazrulakmal commented Feb 18, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 18, 2023 •

edited

Loading

hazrulakmal commented Feb 27, 2023 •

edited

Loading