Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test all metrics against sklearn (with many input trials) #3230

Closed
awaelchli opened this issue Aug 27, 2020 · 18 comments · Fixed by #3517
Closed

Test all metrics against sklearn (with many input trials) #3230

awaelchli opened this issue Aug 27, 2020 · 18 comments · Fixed by #3517
Labels
ci Continuous Integration feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on
Milestone

Comments

@awaelchli
Copy link
Member

awaelchli commented Aug 27, 2020

Hand-chosen values are not enough, we need to test with a large batch of inputs where possible.
Something in this style, maybe with a fixed seed:

def test_auroc_versus_sklearn():
    for i in range(100):
        target = torch.randint(0, 2, size=(10, ))
        pred = torch.randint(0, 2, size=(10,))
        score_sk = sklearn.metrics.roc_auc_score(target.numpy(), pred.numpy())  # sklearn
        score_pl = auroc(pred, target)  # Pytorch Lightning 
        assert torch.allclose(torch.tensor(score_pl).float(), torch.tensor(score_sk).float())
@awaelchli awaelchli added feature Is an improvement or enhancement help wanted Open to be worked on ci Continuous Integration labels Aug 27, 2020
@awaelchli awaelchli self-assigned this Aug 27, 2020
@awaelchli awaelchli changed the title Test all metrics agains sklearn (with many inputs) Test all metrics against sklearn (with many inputs) Aug 27, 2020
@awaelchli awaelchli changed the title Test all metrics against sklearn (with many inputs) Test all metrics against sklearn (with many input trials) Aug 27, 2020
@rohitgr7
Copy link
Contributor

maybe with a fixed seed

I don't think a seed is required here.

@Borda
Copy link
Member

Borda commented Aug 28, 2020

maybe with a fixed seed

I don't think a seed is required here.

I would agree here that all metric shall be deterministic so even using different seeds increase the coverage or cases and so far you pas the same values to the two functions it shall always give the same result, right?

@Borda Borda added the Metrics label Aug 28, 2020
@rohitgr7
Copy link
Contributor

I would agree here that all metric shall be deterministic so even using different seeds increase the coverage or cases and so far you pas the same values to the two functions it shall always give the same result, right?

Agreed, this is one good point.

@SkafteNicki
Copy link
Member

@CamiVasz
Copy link

CamiVasz commented Sep 25, 2020

Hi! I am not a contributor, but I am a user of a library called hypothesis, which may be more suitable for this specific case. This library allows the user to write parameterized tests and then chooses the cases that are most likely to make the program fail, that is, the library is really robust to edge cases and can really help to find those that can be problematic to the implementation.

@Borda
Copy link
Member

Borda commented Sep 27, 2020

Hi! I am not a contributor, but I am a user of a library called hypothesis, which may be more suitable for this specific case. This library allows the user to write parameterized tests and then chooses the cases that are most likely to make the program fail, that is, the library is really robust to edge cases and can really help to find those that can be problematic to the implementation.

hi that you for your recommendation, just not sure if I follow your recommendation, mind write a bit more how you would use https://hypothesis.works in PL...

@CamiVasz
Copy link

Hi! I am not a contributor, but I am a user of a library called hypothesis, which may be more suitable for this specific case. This library allows the user to write parameterized tests and then chooses the cases that are most likely to make the program fail, that is, the library is really robust to edge cases and can really help to find those that can be problematic to the implementation.

hi that you for your recommendation, just not sure if I follow your recommendation, mind write a bit more how you would use https://hypothesis.works in PL...

The idea would be to generate a hypothesis test, to test Sklearn metrics against the PL metrics, and let the library test the corner cases (as well as the "common" ones) and that way assert that both implementations are concordant, without the need to design the cases by hand, nor search for complicated patterns.

@Borda
Copy link
Member

Borda commented Sep 28, 2020

@CamiVasz that sounds cool, mind draft a small example of how to use it, and eventually we can extend in on more PL cases... 🐰

@CamiVasz
Copy link

@CamiVasz that sounds cool, mind draft a small example of how to use it, and eventually we can extend in on more PL cases...

https://colab.research.google.com/drive/1Dprqr1nbtgCFwsyUyb6UbXe9FE7X73Q5?usp=sharing
Here is a small example featuring the mse.

@SkafteNicki
Copy link
Member

@CamiVasz looks cool, but could you tell me the difference between hypothesis and just creating two random tensors myself (using torch.randn for example)

@CamiVasz
Copy link

CamiVasz commented Sep 29, 2020

@CamiVasz looks cool, but could you tell me the difference between hypothesis and just creating two random tensors myself (using torch.randn for example)

Hypothesis generation is biased towards edge cases, maximizing the probability of failure. When you generate random numbers, these edge cases that you want to find have the same probability of appearing that easy cases.

@awaelchli
Copy link
Member Author

Just found that pytorch is also using hypothesis
https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#unit-testing

@Borda
Copy link
Member

Borda commented Oct 2, 2020

Just found that pytorch is also using hypothesis
https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#unit-testing

shall we open a new issue just for hypothesis testing? :]

@awaelchli
Copy link
Member Author

Yeah, maybe @CamiVasz can do that and I think it would also be great to see an example of it using one of our tests, to show the motivation.

@Borda
Copy link
Member

Borda commented Oct 2, 2020

Yeah, maybe @CamiVasz can do that and I think it would also be great to see an example of it using one of our tests, to show the motivation.

it would be great to have it as HackOctober issue :]

@Borda Borda modified the milestones: 1.1, 1.0 Oct 6, 2020
@CamiVasz
Copy link

CamiVasz commented Oct 7, 2020

It would be great to work on that! Is this still on board?

@awaelchli
Copy link
Member Author

awaelchli commented Oct 8, 2020

@justusschock @SkafteNicki @ananyahjha93 @teddykoker Do you guys need help with testing the new metrics? @CamiVasz wants to help.

@justusschock
Copy link
Member

justusschock commented Oct 8, 2020

Yeah sure. I think the whole functional API would be a good place to start.

And we could then later extend it to the revamped class interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants