Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[blocked by #1756] Add decorator to auto-move data for inference #1526

Closed
wants to merge 18 commits into from

Conversation

HenryJia
Copy link
Contributor

@HenryJia HenryJia commented Apr 19, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Attempt to implement, fixes #1412
Currently only works for GPUs and not TPUs

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team April 19, 2020 03:20
@pep8speaks
Copy link

pep8speaks commented Apr 19, 2020

Hello @HenryJia! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-17 12:52:06 UTC

@HenryJia HenryJia changed the title [WIP] Auto move data for inference Auto move data for inference Apr 19, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

tests/models/test_gpu.py Outdated Show resolved Hide resolved
pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
tests/models/test_gpu.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team April 19, 2020 09:38
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for looking into this.
My main concern is that now the LightningModule is not anymore behaving like the nn.Module since you overwrite __call__. I think the idea is that LM can be used just like an nn.Module outside of PL. This PR kinda breaks that. At the minimum, I would show a warning that this is done automatically because the user may be wondering why data transfers are slow (cpu to gpu for example).
Also, the assertion that the parameters must be on the same device is not correct.

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
@HenryJia
Copy link
Contributor Author

HenryJia commented Apr 19, 2020

My main concern is that now the LightningModule is not anymore behaving like the nn.Module since you overwrite __call__. I think the idea is that LM can be used just like an nn.Module outside of PL. This PR kinda breaks that. At the minimum, I would show a warning that this is done automatically because the user may be wondering why data transfers are slow (cpu to gpu for example).

I'm not completely overwriting it, I am using super to call nn.Module's __call__ which does almost all of the work. That being said, what you're saying does make sense, there should be some sort of warning for it

Also, the assertion that the parameters must be on the same device is not correct.

On second thought yes that makes sense, I'll get rid of that. I'll have to rethink how I detect the device that the model is on then

@HenryJia
Copy link
Contributor Author

HenryJia commented Apr 20, 2020

@awaelchli I've now moved the data transfer code to pytorch_lighting.utilities to remove the issue of code duplication, and added warnings

@Borda Borda added the feature Is an improvement or enhancement label Apr 20, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also missing changelog

pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
pytorch_lightning/core/lightning.py Outdated Show resolved Hide resolved
@mergify mergify bot requested review from a team April 20, 2020 05:46
@Borda Borda added this to the 0.7.4 milestone Apr 20, 2020
@HenryJia
Copy link
Contributor Author

@Borda done!

@mergify
Copy link
Contributor

mergify bot commented Apr 20, 2020

This pull request is now in conflict... :(

@codecov
Copy link

codecov bot commented Apr 21, 2020

Codecov Report

Merging #1526 into master will increase coverage by 1%.
The diff coverage is 85%.

@@           Coverage Diff           @@
##           master   #1526    +/-   ##
=======================================
+ Coverage      88%     89%    +1%     
=======================================
  Files          69      70     +1     
  Lines        4316    3833   -483     
=======================================
- Hits         3805    3415   -390     
+ Misses        511     418    -93     

@mergify
Copy link
Contributor

mergify bot commented Apr 22, 2020

This pull request is now in conflict... :(

@HenryJia
Copy link
Contributor Author

@Borda I think I've dealt with all the issues you pointed out, would you mind reviewing again?


if callable(getattr(batch, 'to', None)):
if warn_on_transfer:
rank_zero_warn('Auto transferred data to device {}'.format(device))
Copy link
Member

@awaelchli awaelchli Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the warning out to the __call__ , because 1. this utility function is more general (it is used in other parts) and 2. this function is recursive, so if a dict of tensors is passed in, the warning would be shown multiple times.

Copy link
Contributor Author

@HenryJia HenryJia Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli I thought about this, a problem with that is I effectively run into the same code duplication problem by trying to detect which device it's on in __call__ since I'd need to recurse into whatever format the data is in again in almost the exact same way
Also, I believe rank_zero_warn will only warn once anyway so that is not an issue

output = model(x) # Lightning will automove data here and warn you of it

"""
devices = [p.device for p in self.parameters()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, if looping always over all the params is a good idea. Can we maybe cache the devices somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcarilli are we missing a simpler/cleaner way of doing this?

x = x.cpu()
model.cuda()

# this works in lightning
out = model(x)

# out is cuda tensor now

Copy link

@mcarilli mcarilli Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't understand what you're trying to do here, but it looks like you're only using device[0], so why collect them all? Also if the model params do reside on multiple devices, it's hard to predict which device the user actually wants the input data to reside on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I only apply automatic data transfer if we are dealing with the simple case of the model residing on one device, as trying to auto transfer data when the model is spread across multiple devices is very non-trivial and is heavily dependent on model structure

device = devices[0]
data = transfer_data_to_device(data, device.type, device.index, warn_on_transfer=True)
kwargs = transfer_data_to_device(kwargs, device.type, device.index, warn_on_transfer=True)
return super(LightningModule, self).__call__(*data, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Borda and I discussed this and we both agree, that we shouldn't do this in the Module (at least not by default). In our opinion we should always be able to use lightning module as nn module.

What I propose is the following:

We change this part to a decorator, that can be added to forward and is automatically added from trainer. (sorry for that coming so late).

In that case you could make the decorator a class, that also caches devices eventually :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the idea of adding the decorator dynamically there!

@mergify mergify bot requested review from a team April 24, 2020 07:27
@Borda Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020
@mergify
Copy link
Contributor

mergify bot commented Apr 24, 2020

This pull request is now in conflict... :(

@williamFalcon
Copy link
Contributor

@HenryJia maybe this should be a broader effort to do distributed inference?

But we probably need a LightningModule method?

Here's a brainstorm on how we might be able to solve distributed inference?

model = LightningModule.load_from_checkpoint(...)
model.init_distributed(backend='', gpus=2, etc...)

model(x)

@HenryJia
Copy link
Contributor Author

Makes sense, I'll look at this again with fresh ideas at a later point, I'm a bit busy as of late with other things right now

@mergify
Copy link
Contributor

mergify bot commented May 12, 2020

This pull request is now in conflict... :(

@Borda Borda modified the milestones: 0.7.6, 0.8.0, 0.7.7 May 12, 2020
Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryJia this is great! Let's go with the suggestion by @justusschock and @Borda to build a decorator instead!

@mergify mergify bot requested a review from a team May 17, 2020 12:53
@williamFalcon williamFalcon changed the title Auto move data for inference [WIP] Add decorator to auto-move data for inference May 17, 2020
@HenryJia
Copy link
Contributor Author

@HenryJia this is great! Let's go with the suggestion by @justusschock and @Borda to build a decorator instead!

Sounds good, I'll get back on this in a couple of weeks time when all my university exams are over

@awaelchli
Copy link
Member

Let us know if some of us should pick up your work and continue :)

@Borda
Copy link
Member

Borda commented May 18, 2020

I am afraid that letting it sleep and finish it in couple weeks would be a bit difficult regarding continues development...
@awaelchli mind take it over?

@awaelchli
Copy link
Member

@Borda sure I could look at it after #1756 is merged which brings some simplifications that could be useful here too.

@Borda Borda changed the title [WIP] Add decorator to auto-move data for inference [blocked by #1412] Add decorator to auto-move data for inference May 22, 2020
@Borda Borda changed the title [blocked by #1412] Add decorator to auto-move data for inference [blocked by #1756] Add decorator to auto-move data for inference May 22, 2020
@williamFalcon
Copy link
Contributor

  1. Where are the docs on using this?
  2. in the docs we need an example
  3. i don't see the decorator
  4. we shouldn't warn the user... this will get annoying very quickly

@awaelchli awaelchli marked this pull request as draft May 25, 2020 11:55
@awaelchli
Copy link
Member

awaelchli commented May 25, 2020

@williamFalcon This PR is continued over here #1905
I was specifically asked to take over the work

@Borda Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020
@HenryJia
Copy link
Contributor Author

Sorry for the late reply.
Yeah feel free to take over, I'll have a lot more time after Saturday and can get back on this if you want. Since there's a new PR should I close this one then?

@awaelchli
Copy link
Member

awaelchli commented May 27, 2020

We can close it or you can merge the changes in my branch into yours if you'd like to make adjustments. Either one is fine with me. Note that my PR is rebased onto #1756, not master, but it also contains your commits.
I really hope #1756 gets merged soon.

I think one thing we need to figure out is how we will best apply the decorator.
Should there be a toggle on the LightningModule to turn it on and off? Or should only Trainer add the decorator or user manually?

@Borda
Copy link
Member

Borda commented May 27, 2020

Btw, for changing destination branch, there is no need to close PR, we can just change the destination branch =)

@HenryJia
Copy link
Contributor Author

HenryJia commented Jun 1, 2020

I read through your branch. It all looks good to me and better than where I left this. So I'll close this PR for now (unless anyone has any objections)

@HenryJia HenryJia closed this Jun 1, 2020
@Borda Borda added the duplicate This issue or pull request already exists label Jun 1, 2020
@moi90 moi90 mentioned this pull request Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists feature Is an improvement or enhancement waiting on author Waiting on user action, correction, or update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto move input to proper device for inference
7 participants