Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/torchtext include lengths #2689

Merged

Conversation

thschaaf
Copy link
Contributor

@thschaaf thschaaf commented Jul 24, 2020

What does this PR do?

It fixes a bug when using torchtex and torchtext.data.Field with include_lengths=True that arises when transferring data to GPU.
It adds tests to check if Batches created by torchtext with include_lengths=True and include_lengths=False are processed by the Trainer.fit().

The fix checks if the data is a Tensor, tuple, or list before sending it to the device. If it is a tuple, or list, it iterates over the elements and sends them to the device. (Implementation changed and it uses now move_data_to_device recursively.)

Fixes #2688

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team July 24, 2020 19:38
@pep8speaks
Copy link

pep8speaks commented Jul 24, 2020

Hello @thschaaf! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-30 01:53:43 UTC

@awaelchli
Copy link
Member

@thschaaf Thanks for the PR. How much value do you see in this support? I was told by torchtext member that they will drop the Batch class from torchtext moving forward.
pytorch/text#861
We recently added the support for Batch class in PL because of a requrest but at that time I was not aware of its legacy state.
What do you think about this?

@thschaaf
Copy link
Contributor Author

@thschaaf Thanks for the PR. How much value do you see in this support? I was told by torchtext member that they will drop the Batch class from torchtext moving forward.
pytorch/text#861
We recently added the support for Batch class in PL because of a requrest but at that time I was not aware of its legacy state.
What do you think about this?

@awaelchli Hopefully when torchtext removes the Batch class they do it without breaking too much code from people (in a substantial way). There is enough value to support this until torchtext actually does change their implementation. In my case the Batch object of torchtext is used behind the curtain, and it caused my Skip-Thought model training to fail on GPUs. I am sure others might run into similar issues, and was quite happy that the change to Pytorch-Lightning is quite compact. With this change RNN training works on a GPU.

What do you think about the added tests? They are agnostic to the underlying Batch class, only making sure that the the torchtext.data.Field paramter include_length=True is tested. They might be useful in the future, even after the code dealing with the torchtext Batch class is removed.

The change torchtext is planing seems sensible, but more people might use Pytorch-Lightning together with torchtext earlier. It is not intuitive if a model training runs on the CPU but throws such an exception on GPU.

Of course having the PR changes become part of PL would make my Skip-Thought model train faster and my life easier.

@Borda Borda added the bug Something isn't working label Jul 25, 2020
CHANGELOG.md Outdated Show resolved Hide resolved
tests/utilities/test_apply_func_torchtext.py Outdated Show resolved Hide resolved
tests/utilities/test_apply_func_torchtext.py Outdated Show resolved Hide resolved
@Borda Borda requested review from justusschock, SkafteNicki and a team July 25, 2020 09:00
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@thschaaf thschaaf requested review from Borda and removed request for a team July 26, 2020 16:02
@mergify mergify bot requested a review from a team July 26, 2020 16:02
pytorch_lightning/utilities/apply_func.py Outdated Show resolved Hide resolved
tests/utilities/test_apply_func_torchtext.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 26, 2020 17:47
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@thschaaf thschaaf requested review from awaelchli and removed request for a team July 26, 2020 19:29
@mergify mergify bot requested a review from a team July 26, 2020 19:30
@thschaaf
Copy link
Contributor Author

thschaaf commented Jul 29, 2020

ci/circleci: TPU-tests is constantly failing with ERROR: (gcloud.auth.activate-service-account) Could not read json file /home/circleci/gcloud-service-key.json: No JSON object could be decoded.
This seems unrelated to code changes of PR. How can that be fixed?

the issue is that somehow it is running in your CircleCI env which is missing GKE credentials...

@Borda Thanks! For some reasons, which I don't remember exactly, I followed the pytorch-lightning project. Just to unfollow the project did the trick.

@thschaaf thschaaf changed the title Bugfix/torchtext include lengths [wip] Bugfix/torchtext include lengths Jul 29, 2020
@thschaaf thschaaf changed the title [wip] Bugfix/torchtext include lengths Bugfix/torchtext include lengths Jul 29, 2020
@thschaaf thschaaf changed the title Bugfix/torchtext include lengths [wip] Bugfix/torchtext include lengths Jul 29, 2020
@thschaaf thschaaf changed the title [wip] Bugfix/torchtext include lengths Bugfix/torchtext include lengths Jul 29, 2020
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks for the fix @thschaaf

tests/utilities/test_apply_func_torchtext.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 29, 2020 19:17
@mergify mergify bot requested a review from a team July 29, 2020 19:31
tests/utilities/test_apply_func_torchtext.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team July 29, 2020 21:49
@Borda Borda added the ready PRs ready to be merged label Jul 29, 2020
@mergify
Copy link
Contributor

mergify bot commented Jul 29, 2020

This pull request is now in conflict... :(

@williamFalcon williamFalcon merged commit a6719f0 into Lightning-AI:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training on GPU failed with Torchtext when using include_lengths=True in torchtext.data.Field
5 participants