Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device #1790

Merged
merged 2 commits into from
May 12, 2020
Merged

Device #1790

merged 2 commits into from
May 12, 2020

Conversation

williamFalcon
Copy link
Contributor

add self.device pointer to lightningModule

@codecov
Copy link

codecov bot commented May 12, 2020

Codecov Report

Merging #1790 into master will decrease coverage by 0%.
The diff coverage is 62%.

@@          Coverage Diff           @@
##           master   #1790   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          69      69           
  Lines        4304    4312    +8     
======================================
+ Hits         3796    3801    +5     
- Misses        508     511    +3     

@williamFalcon williamFalcon merged commit 4b30ef6 into master May 12, 2020
@awaelchli
Copy link
Contributor

If the LightningModule gets used in a context outside of Lightning (simply as an nn.Module) then moving the module with .to(...) could break the user's code if they internally use self.device as for example in the GAN etc. We should probably override the .to and cuda methods so that self.device get's updated, so the trainer wouldn't have to update it?

Comment on lines 503 to +504
model.to(xm.xla_device())
self.device = xm.xla_device()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example here, we could simply override the .to in LightningModule and no extra code in the Trainer is necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code would be in one place and therefore easier to maintain

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, .to calls submodules too, so this approach would automatically take care of nested LightingModules!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait... does nn.Module have a .device??
if so, i don't think we should overwrite no? i thought the weights had it not the module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need to sleep lol.. so, i can think about it tomorrow... but sounds interesting :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it doesn't have it. see here:
pytorch/pytorch#7460
I guess if we do it then we would run into issues when the user starts to move their submodules to different devices by hand. but that would anyway be a problem :)
yes let's do it tomorrow :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do this as a read-only property similar to what I did on metrics. But I also agree, this should be read-only and not be used for device transfers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly like you did in metrics, that's exactly what I meant 👍 Nice

@mergify mergify bot requested a review from a team May 12, 2020 04:23
@Borda Borda deleted the device branch May 12, 2020 06:03
@Borda Borda added the feature Is an improvement or enhancement label May 12, 2020
@Borda Borda added this to the 0.7.6 milestone May 12, 2020
@Borda Borda mentioned this pull request May 12, 2020
ananthsub added a commit that referenced this pull request Feb 5, 2022
…n DeviceStatsMonitor

Minor refactor to use the strategy's own `root_device` instead of the LightningModule's device property.

Attempts at manual model parallelization by extending this plugin will face difficulties with the assumption that the LightningModule has all of its parameters on the same device. 

For those use cases, it is critical to remove the assumption that the module has a device property (device in general goes against PyTorch module's design principles:
- pytorch/pytorch#7460
- #1790 (comment)
Borda pushed a commit that referenced this pull request Feb 5, 2022
* Use trainer.strategy.root_device in favor of LightningModule.device in DeviceStatsMonitor

Minor refactor to use the strategy's own `root_device` instead of the LightningModule's device property.

Attempts at manual model parallelization by extending this plugin will face difficulties with the assumption that the LightningModule has all of its parameters on the same device. 

For those use cases, it is critical to remove the assumption that the module has a device property (device in general goes against PyTorch module's design principles:
- pytorch/pytorch#7460
- #1790 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants