-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.8.2 calls backward on '_GeneratorContextManager' #2411
Comments
did you override optimizer step? |
Can confirm this happens on 0.8.3 |
ok. Can you post a colab example that replicates this? |
@williamFalcon my optimizer step was untouched, I can't run more testing atm but I'll get to it as soon as I can |
@williamFalcon Hi I also encountered this, with normal Adam optimizer. I don't have a colab to replicate this atm but from what I saw earlier, this can be replicated with any setting as long as the Trainer is set to precision=16 when using Apex. Under this condition, the following lines from training_loop.py and hooks.py will run:
will cause the closure_loss be a _GeneratorContextManager object. Which then cannot have a backward() method. It seems under the current design, pytorch lighting's scale_loss function can only be used as a context? |
@williamFalcon Here's a colab example (my first time using colab so let me know if you have issues seeing it) https://colab.research.google.com/drive/1G08jVDpx-T-5HE2c89RLJdq4u67mM2-o?usp=sharing I suspect the issue lies with Apex AMP as suggested above by @aeryen |
@aeryen min share a minimal example to reproduce? |
hi sorry for the delay: https://colab.research.google.com/drive/1rjaRRwgBTm4CKPfe9po_WSxnKqY4jDRv?usp=sharing |
@williamFalcon yes, the master version works for me now. Thanks! |
@williamFalcon can confirm as well! and sorry couldn't be more helpful earlier |
Hi @williamFalcon thanks for the quick fix. I just upgraded but am now seeing a different error:
I'm not manually assigning tensors to a device (i.e. PL should be assigning all tensors as CUDA tensors) and I am not using sparse tensors (at least not that I am aware of). EDIT: I found the issue. I guess metrics need to be CUDA tensors now. Thanks again :) |
@Anjum48 mind send a new issue? |
🐛 Bug
0.8.2 calls backward on '_GeneratorContextManager' and crashes training.
0.8.1 works correctly. my
training_step
returns{'loss':loss, 'log':{'learn_rate':self.lr}}
Expected behavior
backward is called on the loss and training runs correctly
The text was updated successfully, but these errors were encountered: