Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking compatibility with custom datatypes implementing .to #2314

Closed
digitalillusions opened this issue Jun 22, 2020 · 6 comments · Fixed by #2335
Closed

Breaking compatibility with custom datatypes implementing .to #2314

digitalillusions opened this issue Jun 22, 2020 · 6 comments · Fixed by #2335
Assignees
Labels
bug Something isn't working

Comments

@digitalillusions
Copy link

🚀 Feature

Bring back compatibility for custom datatypes in collections implementing .to for transferring data.

Motivation

I am using Pytorch Lightning together with Pytorch Geometric. Pytorch Geometric implements several custom datatypes and dataloaders which is really useful for geometric deep learning. Everything worked well with pytorch lightning 0.7.6, as the custom datatypes implement a .to method for transferring the data to different devices.
However, with the recent 0.8.1 update, this is no longer possible and I had to scour the documentation to be able to implement a fix using transfer_batch_to_device(batch, device). This is in my opinion not very pretty, as my batch looks like this

{"data": pytorch geometric batch object, "id": tensor, ...}

i.e. it is just a dictionary of types that all implement the .to method.

Pitch

  • Make it possible for classes implementing the .to method to be transferred automatically
  • If part of the batch could not be transferred automatically output a warning letting the user know, that a custom transfer function for the batch might be required, or to implement the .to method for custom datatypes in the batch
  • Add a note to the introduction guide about custom datatypes and handling for custom datatypes

Alternatives

If this change was intentional and the behavior of trying to call the .to method is not desired, I think there should definitely be some more documentation about this, in a more obvious place.

Additional context

@digitalillusions digitalillusions added feature Is an improvement or enhancement help wanted Open to be worked on labels Jun 22, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@awaelchli
Copy link
Member

This should definitely work. It was not my intention to break it and I thought I had even a test for it. I will look into this and also converting the issue to a bug report.

@awaelchli awaelchli self-assigned this Jun 23, 2020
@awaelchli awaelchli added bug Something isn't working and removed feature Is an improvement or enhancement help wanted Open to be worked on labels Jun 23, 2020
@awaelchli
Copy link
Member

awaelchli commented Jun 23, 2020

Have identified the problem and fix is in the works... Silly oversight on my part.

Regarding the warning, not sure. Is it not enough to point out in the docs that anything not a tensor (or .to() movable) will be left untouched unless the hook is implemented?

Otherwise, such a check would have to be done on every batch, because in theory the batch can (structurally) change every iteration, and that would cause a spam of warning messages.

@digitalillusions
Copy link
Author

Have identified the problem and fix is in the works... Silly oversight on my part.

Awesome, thank you!

Regarding the warning, not sure. Is it not enough to point out in the docs that anything not a tensor (or .to() movable) will be left untouched unless the hook is implemented?

Otherwise, such a check would have to be done on every batch, because in theory the batch can (structurally) change every iteration, and that would cause a spam of warning messages.

That's a good point, in that case I think the docs should suffice. Though maybe the compatible data types can be mentioned in the Introduction Guide? Reading through the Introduction Guide I found no mention of this anywhere, and I think it may fit well into the Data Section. Possibly after the Models defined by data section, there could be a section which talks briefly about custom data loaders, custom batches and what kind of custom data lightning can handle automatically. Maybe also just refer to the appropriate place in the documentation.

@awaelchli
Copy link
Member

awaelchli commented Jun 23, 2020

@digitalillusions would you like to review/test the fix? #2335
Also, since the docs were your idea, would you like to make a separate PR with your docs suggestion to the intro guide?

@digitalillusions
Copy link
Author

Sure thing, I think I'll have time to review and to test it tomorrow.
I can also try my hand at a pull request for the docs, though I'm still fairly new to lightning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants