About Loss NaN #15

liangleikun · 2020-10-04T03:07:24Z

Hello, Thanks for your nice work! And I have some problems.
I trained on my own dataset. After some epochs of training, loss would become Nan. I added the minimum in Focalloss, but the wh_loss and off_loss also are Nan and inf. Could you give me some advice. Thanks.

yijingru · 2020-10-04T06:11:50Z

Hi, I met this problem before. I guess the unnormalized data distribution in the head layers would result in unstable training. Can you add a batch normalization after the head convolutional layers (e.g. Conv+BN+ReLU+Conv) to see if it helps?

liangleikun · 2020-10-04T09:06:08Z

Thank you very much. I have added a batch normalization in the head convolutional layers. It is useful.

yijingru · 2020-10-04T19:41:37Z

Thanks for letting me know. I add this information to the new version.

huangmanba1 · 2020-10-18T05:21:36Z

Hi, After I added a batch normalization in the head convolutional layers , the loss still become Nan, what should I do

yijingru · 2020-10-18T17:22:13Z

Hi, After I added a batch normalization in the head convolutional layers , the loss still become Nan, what should I do

How large is the batch size? The other solution I think would be (1) increasing the batch size or (2) decreasing the learning rate. Empirically I stop at about 40 epochs using the default learning rate.

18804601171 · 2020-10-29T06:26:37Z

@yijingru my batch size is 8 and I use 2 gpus, In addition, I add batch batch normalization,but loss is still become nan, what should I do?

rush9838465 · 2020-11-11T02:05:09Z

I noticed that the Focalloss log function input parameter pred is sometimes 0. (sigmoid output).
So I changed the following code, Seems to be working:
pos_loss = torch.log(pred+0.0000001) * torch.pow(1 - pred, 2) * pos_inds

Fly-dream12 · 2021-08-14T08:45:25Z

I have tried this but the hm loss is still nan. @yijingru

navidasj96 · 2022-07-18T11:28:03Z

main reason that cause this problem is images without any object in it, by deleting those images after spliting main images with scale .5 and 1 this problem will solve and decreasing learning rate and adding that batch normalization layer didnt help me without deleting that images

yijingru · 2022-07-19T21:00:35Z

main reason that cause this problem is images without any object in it, by deleting those images after spliting main images with scale .5 and 1 this problem will solve and decreasing learning rate and adding that batch normalization layer didnt help me without deleting that images

Thanks a lot for the effort to clear this issue! It's good to know the cause of the NAN loss. I will also share your comment in README.md. Thanks!

liangleikun closed this as completed Feb 14, 2023

abhi1kumar mentioned this issue Feb 20, 2023

nan loss after 5 epochs on custom dataset abhi1kumar/DEVIANT#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Loss NaN #15

About Loss NaN #15

liangleikun commented Oct 4, 2020

yijingru commented Oct 4, 2020

liangleikun commented Oct 4, 2020

yijingru commented Oct 4, 2020

huangmanba1 commented Oct 18, 2020

yijingru commented Oct 18, 2020

18804601171 commented Oct 29, 2020

rush9838465 commented Nov 11, 2020

Fly-dream12 commented Aug 14, 2021

navidasj96 commented Jul 18, 2022

yijingru commented Jul 19, 2022

About Loss NaN #15

About Loss NaN #15

Comments

liangleikun commented Oct 4, 2020

yijingru commented Oct 4, 2020

liangleikun commented Oct 4, 2020

yijingru commented Oct 4, 2020

huangmanba1 commented Oct 18, 2020

yijingru commented Oct 18, 2020

18804601171 commented Oct 29, 2020

rush9838465 commented Nov 11, 2020

Fly-dream12 commented Aug 14, 2021

navidasj96 commented Jul 18, 2022

yijingru commented Jul 19, 2022