Add Wilcoxon's signed rank test #237

douwekiela · 2022-08-09T06:27:36Z

Figured it'd be good to add a few more comparisons

douwekiela · 2022-08-09T06:29:40Z

comparisons/wilcoxon/README.md

+
+```python
+wilcoxon = evaluate.load("wilcoxon")
+results = wilcoxon.compute(predictions1=[-7, 123, 43, 4, 5], predictions2=[1337, -9, 1, 2, 3])


Hrmm should probably make some of these floats to make clear that those are allowed too

HuggingFaceDocBuilderDev · 2022-08-09T06:30:51Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Hi @douwekiela, thanks for the super clean PR. Left a comment regarding the feature types related to your own comment.

lvwerra · 2022-08-11T10:12:09Z

comparisons/wilcoxon/wilcoxon.py

+            features=datasets.Features(
+                {
+                    "predictions1": datasets.Value("int64"),
+                    "predictions2": datasets.Value("int64"),
+                }
+            ),


If you want it to work for both int and float you can pass a list of dataset.Features and it then automatically detects which one works. You can have a look at BLEU. Alternatively, I think floats would probably work both cases anyway, no?

Add Wilcoxon's signed rank test for comparing model predictions, e.g. for testing whether the difference in BLEU score between two models is significant.

Add Wilcoxon's signed rank test

c9a4c57

douwekiela requested a review from lvwerra August 9, 2022 06:27

douwekiela commented Aug 9, 2022

View reviewed changes

lvwerra approved these changes Aug 11, 2022

View reviewed changes

douwekiela added 2 commits August 11, 2022 19:37

Use float as prediction data type

c5bce72

Fix error in example

89640d7

douwekiela merged commit 3cd38e2 into main Aug 11, 2022

douwekiela deleted the wilcoxon branch August 11, 2022 13:01

mathemakitten pushed a commit that referenced this pull request Aug 15, 2022

Add Wilcoxon's signed rank test (#237)

bcdd1ff

Add Wilcoxon's signed rank test for comparing model predictions, e.g. for testing whether the difference in BLEU score between two models is significant.

mathemakitten pushed a commit that referenced this pull request Sep 23, 2022

Add Wilcoxon's signed rank test (#237)

916f4ba

Add Wilcoxon's signed rank test for comparing model predictions, e.g. for testing whether the difference in BLEU score between two models is significant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Wilcoxon's signed rank test #237

Add Wilcoxon's signed rank test #237

douwekiela commented Aug 9, 2022

douwekiela Aug 9, 2022

HuggingFaceDocBuilderDev commented Aug 9, 2022 •

edited

Loading

lvwerra left a comment

lvwerra Aug 11, 2022

Add Wilcoxon's signed rank test #237

Add Wilcoxon's signed rank test #237

Conversation

douwekiela commented Aug 9, 2022

douwekiela Aug 9, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 9, 2022 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Aug 11, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 9, 2022 •

edited

Loading