Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The model cannot converge when training #11

Open
dodgaga opened this issue Apr 3, 2018 · 4 comments
Open

The model cannot converge when training #11

dodgaga opened this issue Apr 3, 2018 · 4 comments

Comments

@dodgaga
Copy link

dodgaga commented Apr 3, 2018

Hi,

I just followed the instruction to train the SSD model, but the loss can't fall.

At the beginning, the base_lr= 0.001 but the loss=nan
Then, I set a lower base_lr = 0.0001 , the loss drops from 40+ to ~10 ,and don't have any change.
Next, I kill the training and set the base_lr=0.001 and resume to train, the loss = nan again.
So, maybe the 0.01 is too big for the model, I lower learning rate which base_lr= 0.0004, but the loss is aways ~8.

how much the loss in the SSD model will finally be? and can you give me some advice to training the data?

@yuantailing
Copy link
Owner

yuantailing commented Apr 3, 2018

I find logs and plot the loss. Here are

  1. loss of iteration 0 - 120,000, and
    loss

  2. loss of iteration 1,000 - 120,000.
    loss_after_1000

@yuantailing
Copy link
Owner

yuantailing commented Apr 3, 2018

Since there are ~320,000 subimages, about 320,000 / 14 = 23,000 iterations is 1 epoch. Don't pray loss falling down before 1 epoch.

base_lr = 0.001 is OK for batch_size = 14. It it do not converge, I think you may set base_lr = 0.0004 - 0.0008 and no need to modify it.

@dodgaga
Copy link
Author

dodgaga commented Apr 4, 2018

Thanks! u are right. After 40,000 interations, the loss tends to be 4.
The final loss in your log is ~3, Don't you think it is a bit too high?

@dodgaga dodgaga changed the title The model cannot convergence when training The model cannot converge when training Apr 4, 2018
@yuantailing
Copy link
Owner

Maybe too high, maybe not comparable. Final AP should be close to YOLOv2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants