-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss nan #6
Comments
Can you post your full logs here? Maybe we can help you. |
Thank u for your helping |
Can you try to use python2.7? Our code is only tested with python2.7. |
oh... It might be some different 3.6(I use) and 2.7. I will try with 2.7 and post my result after a later. |
I try with python2.7, but loss nan again... |
How many times have you tried? If you always encounter NAN, we suggest that you clip the gradients with https://pytorch.org/docs/stable/_modules/torch/nn/utils/clip_grad.html . You can firstly try to set max_norm to 100. If the loss becomes NAN again, please reduce it. Thank you. |
@tianzhi0549 The python version should not be the cause since I tested the code with python3.7. |
@YanShuo1992 OK. Thank you for pointing that out. Maybe some dependencies are different and cause the NAN. Just try to clip gradients. I think it can prevent the loss from exploding. |
hey my friend. Thank for your helping! |
@bei-startdt Happy to know that you have solved it:-). |
@tianzhi0549,@bei-startdt ,the same problem,how could I deal with the loss nan detaillly?I don't understand how to clip_gradient,I encounter the same issue when I train coco2014 dataset |
@gittigxuy Did the loss become NAN many times? |
@gittigxuy It should be because your training batch size is only 1. It is too small. We recommend using batch size >= 8. |
Thanks,I have solved the problem,you are right,just config the batch_size>=8 |
why does batch size too small cause nan? |
It's because when calculate sigmoid_focal_loss, You can modify the following line with
|
I try to train coco, but loss is nan.
this is my training script:
this is my result
I have tried for 3 times, always nan.
what's wrong with me?
The text was updated successfully, but these errors were encountered: