Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Loss NaN #15

Closed
liangleikun opened this issue Oct 4, 2020 · 10 comments
Closed

About Loss NaN #15

liangleikun opened this issue Oct 4, 2020 · 10 comments

Comments

@liangleikun
Copy link

Hello, Thanks for your nice work! And I have some problems.
I trained on my own dataset. After some epochs of training, loss would become Nan. I added the minimum in Focalloss, but the wh_loss and off_loss also are Nan and inf. Could you give me some advice. Thanks.

@yijingru
Copy link
Owner

yijingru commented Oct 4, 2020

Hi, I met this problem before. I guess the unnormalized data distribution in the head layers would result in unstable training. Can you add a batch normalization after the head convolutional layers (e.g. Conv+BN+ReLU+Conv) to see if it helps?

@liangleikun
Copy link
Author

Thank you very much. I have added a batch normalization in the head convolutional layers. It is useful.

@yijingru
Copy link
Owner

yijingru commented Oct 4, 2020

Thanks for letting me know. I add this information to the new version.

@huangmanba1
Copy link

Hi, After I added a batch normalization in the head convolutional layers , the loss still become Nan, what should I do

@yijingru
Copy link
Owner

Hi, After I added a batch normalization in the head convolutional layers , the loss still become Nan, what should I do

How large is the batch size? The other solution I think would be (1) increasing the batch size or (2) decreasing the learning rate. Empirically I stop at about 40 epochs using the default learning rate.

@18804601171
Copy link

@yijingru my batch size is 8 and I use 2 gpus, In addition, I add batch batch normalization,but loss is still become nan, what should I do?

@rush9838465
Copy link

I noticed that the Focalloss log function input parameter pred is sometimes 0. (sigmoid output).
So I changed the following code, Seems to be working:
pos_loss = torch.log(pred+0.0000001) * torch.pow(1 - pred, 2) * pos_inds

@Fly-dream12
Copy link

I have tried this but the hm loss is still nan. @yijingru

@navidasj96
Copy link

main reason that cause this problem is images without any object in it, by deleting those images after spliting main images with scale .5 and 1 this problem will solve and decreasing learning rate and adding that batch normalization layer didnt help me without deleting that images

@yijingru
Copy link
Owner

main reason that cause this problem is images without any object in it, by deleting those images after spliting main images with scale .5 and 1 this problem will solve and decreasing learning rate and adding that batch normalization layer didnt help me without deleting that images

Thanks a lot for the effort to clear this issue! It's good to know the cause of the NAN loss. I will also share your comment in README.md. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants