SGD Learning Rate 'Burn In' #15

bobo0810 · 2018-09-19T11:04:18Z

Hi , didn't the learning rate update during the training phase?

glenn-jocher · 2018-09-19T15:01:50Z

@bobo0810 yes, I think you are talking about the SGD learning rate 'burn in', which is supposed to be much smaller for the first 1000 batches of training. This was brought up by @xyutao in issue #2.

I'm going to switch the training from Adam to SGD with burn in in a new commit soon.

glenn-jocher · 2018-09-20T00:01:23Z

@bobo0810 do you have an exact definition of the learning rate over the training? I tried switching to SGD and implementing a burn-in phase but was unsuccessful, the losses diverged before the burn-in completed.

From darknet I think the correct burnin in formula is this, which will slowly ramp up the LR to 1e-3 after 1000 iterations and leave it there:

# SGD burn-in
if (epoch == 0) & (i <= 1000):
    power = ??
    lr = 1e-3 * (i / 1000) ** power
    for g in optimizer.param_groups:
        g['lr'] = lr

I can't find the correct value of power though. I tried with power=2 and training diverged around 200 iterations. Increasing to power=5 training diverges after 400 iterations. power=10 also diverges.

I see that the divergence is in the width and height losses, the other terms appear fine. I think one problem may be that the width and height terms are bound at zero at the bottom, but are unbound at the top, so its possible that the network is predicting impossibly large widths and heights, causing the losses there to diverge. I may need to bound these or redefine the width and height terms and try again. I used a variant of the width and height terms for a different project that had no divergence problems with SGD.

glenn-jocher · 2018-09-20T16:04:52Z

@bobo0810 I've switched from Adam to SGD with burn-in (which exponentially ramps up the learning rate from 0 to 0.001 over the first 1000 iterations) in commit a722601.

bobo0810 · 2018-09-20T16:30:17Z

thank you very much

glenn-jocher · 2018-09-20T17:49:27Z

@bobo0810 your welcome, but the change opened up different issues, mainly that the height and width terms diverged during training, so I had to bound these using new height and width calculations. See issue #2 for a full explanation.

glenn-jocher changed the title ~~learning_rate issue?~~ SGD Learning Rate 'Burn In' Sep 19, 2018

glenn-jocher closed this as completed Nov 7, 2018

chrisway613 mentioned this issue Apr 3, 2020

Exception with NMS when using gpus #1004

Closed

mvanlierBCG mentioned this issue May 12, 2020

SGD Learning Rate 'Burn-In' david8862/keras-YOLOv3-model-set#40

Closed

winnerCR7 mentioned this issue Jul 3, 2020

After interrupting training, load weights/last.pt to continue training #1368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGD Learning Rate 'Burn In' #15

SGD Learning Rate 'Burn In' #15

bobo0810 commented Sep 19, 2018

glenn-jocher commented Sep 19, 2018

glenn-jocher commented Sep 20, 2018

glenn-jocher commented Sep 20, 2018

bobo0810 commented Sep 20, 2018

glenn-jocher commented Sep 20, 2018

SGD Learning Rate 'Burn In' #15

SGD Learning Rate 'Burn In' #15

Comments

bobo0810 commented Sep 19, 2018

glenn-jocher commented Sep 19, 2018

glenn-jocher commented Sep 20, 2018

glenn-jocher commented Sep 20, 2018

bobo0810 commented Sep 20, 2018

glenn-jocher commented Sep 20, 2018