Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/_has_len #2293

Merged
merged 5 commits into from
Jun 20, 2020
Merged

Bugfix/_has_len #2293

merged 5 commits into from
Jun 20, 2020

Conversation

thschaaf
Copy link
Contributor

@thschaaf thschaaf commented Jun 19, 2020

Ready for review

What does this PR do?

function _has_len in trainer/data_loading.py catches 'NotImplementedError' and returns False

When using torchtext.data.Iterator with a parameter batch_size_fn the len function raises a NotImplementedError which is not caught by _has_len function in pytorch-lightning.
The fix is very simple by just returning False if a NotImplementedError is raised.

This change does not introduce dependencies.

Fixes #2277

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?

One test failed a "doctest" on my machine, but this was already failing in the master branch before my changes: pytorch_lightning/loggers/trains.py::pytorch_lightning.loggers.trains.TrainsLogger

  • If you made a notable change (that affects users), did you update the CHANGELOG?

Not sure if this is a notable fix, it certainly improves working with torchtext and therefore affects users.

A suggestion for the CHANGELOG:
Fixed how _has_len handels 'NotImplementedError' e.g. raised by torchtext.data.Iterator (#2277)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team June 19, 2020 23:58
@williamFalcon williamFalcon merged commit 554fb47 into Lightning-AI:master Jun 20, 2020
@thschaaf
Copy link
Contributor Author

Great, thanks for merging!

@williamFalcon not sure if that fix warrants a line in the change log. What are your thoughts on this and the suggested line in the PR?

@@ -295,6 +295,18 @@ def test_train_inf_dataloader_error(tmpdir):
trainer.fit(model)


@pytest.mark.skip('TODO: speed up this test')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be resolved before merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the "CustomNotImplementedErrorDataloader" is a version of 'CustomInfDataloader' with a minor change the behavior should be very similar.

The tests derived by copy and paste of the corresponding tests for 'CustomInfDataloader'. The decorators were copied under the assumption that they are correct or useful. Given that I am not sure about them, it makes sense to remove them. @Borda Please advise if that is a reasonable solution.

On a different note alternatively CustomNotImplementedErrorDataloader could be derived from CustomInfDataloader with the len method added. This would reduce duplicated code. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Borda After removing the decorators pytest failed. This is because I also changed the match string from "infinite DataLoader" to "not_implemented_error DataLoader" when I copied the test. This does not match the MisconfigurationException exception string raised from pytorch_lightning/trainer/data_loading.py. I am changing the match strings to "infinite DataLoader" and comment skipping the tests out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Borda I created a new PR #2307 with the changes to make the tests work. You will see a lot of commits there. It seems difficult for the test to pass on all machines. It seems that the machines cancel if it takes more than 15v minutes total time. When I enabled all the tests sometimes they finished in time on a few machine, but never on all of them. On my laptop I have observed when running the local test that they sometimes just hang (a single test >15 minutes). Maybe this is a problem of pytest in my environment (MacOS), or a more general issue. This merge certainly fixes the issue I observed, and the test in the new PR are technically working. I suggest to continue with the commits of the new PR.

@mergify mergify bot requested a review from a team June 20, 2020 14:39
@thschaaf thschaaf mentioned this pull request Jun 20, 2020
7 tasks
@Borda Borda added the bug Something isn't working label Jun 23, 2020
@mergify mergify bot requested a review from a team June 23, 2020 15:03
@Borda Borda mentioned this pull request Jun 26, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_has_len does not handle NotImplementedError (raised by torchtext)
3 participants