Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/_has_len #2307

Merged
merged 28 commits into from
Jun 26, 2020
Merged

Bugfix/_has_len #2307

merged 28 commits into from
Jun 26, 2020

Conversation

thschaaf
Copy link
Contributor

@thschaaf thschaaf commented Jun 20, 2020

What does this PR do?

Enable tests for dataloader where len is defined but can raise NotImplementedError (e.g. in some torchtext cases).

Continuation of PR #2293

Fixes #2277 #2293

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Jun 20, 2020

Hello @thschaaf! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-23 14:56:23 UTC

@mergify mergify bot requested a review from a team June 20, 2020 22:56
@thschaaf thschaaf mentioned this pull request Jun 21, 2020
7 tasks
@codecov
Copy link

codecov bot commented Jun 21, 2020

Codecov Report

Merging #2307 into master will increase coverage by 0%.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #2307   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          70      70           
  Lines        5502    5502           
======================================
+ Hits         4835    4844    +9     
+ Misses        667     658    -9     

@Borda Borda added the bug Something isn't working label Jun 21, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all

@pytest.mark.skip('TODO: speed up this test')

@mergify mergify bot requested a review from a team June 21, 2020 12:28
@thschaaf thschaaf changed the title [wip] Bugfix/_has_len Bugfix/_has_len Jun 22, 2020
@thschaaf
Copy link
Contributor Author

@Borda I commented out all
@pytest.mark.skip('TODO: speed up this test')

The local test ran successful on my laptop.

@thschaaf
Copy link
Contributor Author

@Borda For your information the added test passed on all machines that showed "canceled" (e.g. https://github.com/PyTorchLightning/pytorch-lightning/runs/792966810?check_suite_focus=true). However I assume that the overall runtime was too long.

Ideally the runtime for the test should be increased.

What do you recommend to get the PR merged?

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -374,7 +374,7 @@ def test_inf_train_dataloader(tmpdir, check_interval):


@pytest.mark.parametrize('check_interval', [50, 1.0])
@pytest.mark.skip('TODO: speed up this test')
# @pytest.mark.skip('TODO: speed up this test')
def test_not_implemented_error_train_dataloader(tmpdir, check_interval):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the following test in which your removed the skip are taking too long because the model runs for the whole epoch, right? But the test suggests that there should be a test for a raise.
I think that's the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also set just nb steps to 3 or so...

@pytest.mark.skip('TODO: speed up this test')
def test_not_implemented_error_dataloader(tmpdir, check_interval):
# @pytest.mark.skip('TODO: speed up this test')
def test_not_implemented_error_val_dataloader(tmpdir, check_interval):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, no test for "not implemented error"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the test refers CustomNotImplementedErrorDataloader which raises a NotImplementedError when len is called. In this configuration training should take place. I am open for name suggestions.

@mergify mergify bot requested a review from a team June 22, 2020 15:46
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@thschaaf thschaaf changed the title [wip] Bugfix/_has_len Bugfix/_has_len Jun 23, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ready to review? :]

CHANGELOG.md Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 23, 2020 05:54
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to get some more descriptive names

tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team June 23, 2020 06:09
@mergify
Copy link
Contributor

mergify bot commented Jun 23, 2020

This pull request is now in conflict... :(

@mergify
Copy link
Contributor

mergify bot commented Jun 23, 2020

This pull request is now in conflict... :(

@thschaaf
Copy link
Contributor Author

thschaaf commented Jun 23, 2020

@Borda Did you have time to look at my previous questions. (inlined below). In particular the one to create a new issue.

@Borda
Copy link
Member

Borda commented Jun 23, 2020

@Borda Did you have time to look at my previous questions. (inlined below). In particular the one to create a new issue.

I hope I 'll get to this today, maybe we can contact on slack?

@Borda
Copy link
Member

Borda commented Jun 23, 2020

This implies the issue has nothing todo with my bug fix from #2293 and indicates probably a more serious issue in Pytorch-lightning or the testing framework or the interaction.

Mind describes your suspicion in a new issue so we can split the PR fix from others

It could be still that there is a problem with the test that I copied, but it is not obvious why parallel data loading would make the test system so unstable just with the new test. Especially since the test got hanging in very different tests.

@williamFalcon have you experienced something similar?

This seems not like a good solution, and I am worried that so far nobody seemed to have run into this issue, except maybe for the author of the original inf_dataloader tests. Since these test were skipped it makes me suspect that this problem could exist undetected for some time.

but they were skipped by your PR, right?

I am not happy with the solution and hope that you or someone else can explain what is going on.

maybe I am missing something, what are you not sure about this tests repair?

How do you feel about merging this and creating a new issue?

yes, that I would do...

cc: @awaelchli

@thschaaf
Copy link
Contributor Author

@Borda Did you have time to look at my previous questions. (inlined below). In particular the one to create a new issue.

I hope I 'll get to this today, maybe we can contact on slack?

Sure. I have slack installed for work. How do I contact you or join the right slack channel or workspace?

@thschaaf
Copy link
Contributor Author

but they were skipped by your PR, right?

Yes. I just locally enabled them and the inf_dataloader test don't take long at all.

@Borda Borda added the ready PRs ready to be merged label Jun 24, 2020
@awaelchli
Copy link
Member

Not so sure about the num_workers = 0 in the test. Sooner or later someone is going to change that and then has to debug these tests. I would rather not set it if we can't explain why it does not work. Maybe it has to do with raising the StopIteration?

@williamFalcon williamFalcon merged commit 7c0a3f4 into Lightning-AI:master Jun 26, 2020
@awaelchli
Copy link
Member

awaelchli commented Jun 26, 2020

@thschaaf This raises an NotImplementedError now on the master branch, could you have a look?

@Borda Borda mentioned this pull request Jun 26, 2020
7 tasks
@thschaaf
Copy link
Contributor Author

@thschaaf This raises an NotImplementedError now on the master branch, could you have a look?

@awaelchli will do.

@Borda
Copy link
Member

Borda commented Jun 26, 2020

addressing the issue in #2375, @thschaaf pls check it there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci Continuous Integration ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_has_len does not handle NotImplementedError (raised by torchtext)
5 participants