Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate finder auto-suggest-LR algorithm is slightly too naive #1767

Closed
mstewart141 opened this issue May 9, 2020 · 6 comments · Fixed by #1801
Closed

Learning rate finder auto-suggest-LR algorithm is slightly too naive #1767

mstewart141 opened this issue May 9, 2020 · 6 comments · Fixed by #1801
Labels
help wanted Open to be worked on let's do it! approved to implement question Further information is requested
Milestone

Comments

@mstewart141
Copy link

mstewart141 commented May 9, 2020

🐛 Bug

The learning rate finder auto-suggest-LR finds the point of steepest loss descent but can be tricked by spikes early in the process. A short burn-in period at the beginning would resolve the issue.

To Reproduce

image

@mstewart141 mstewart141 added bug Something isn't working help wanted Open to be worked on labels May 9, 2020
@Borda
Copy link
Member

Borda commented May 11, 2020

@SkafteNicki mind have a look ^^

@Borda Borda added question Further information is requested and removed bug Something isn't working labels May 11, 2020
@williamFalcon
Copy link
Contributor

This makes sense... @mstewart141 mind submitting a PR? :)

@williamFalcon williamFalcon added the let's do it! approved to implement label May 12, 2020
@williamFalcon williamFalcon added this to the 0.8.0 milestone May 12, 2020
@williamFalcon
Copy link
Contributor

but wouldn't it also be tricked by spikes anywhere? we're talking about local mins here...

@justusschock
Copy link
Member

Maybe something like a patients would be a good idea... If it doesn't furhter decrease after 5 additional lr changes or something, use the lr with the minimum so far...

@mstewart141
Copy link
Author

mstewart141 commented May 12, 2020

i'm happy to try and help out.

one simple fix would be to add a "minimum_lr_threshold" kwarg to the plot function referenced above, with a default value of say 1e-5. few models in practice want a max/initial lr below that figure (of course the default could be even lower as well). then, when plotting, plot the whole plot as done now, but select the best suggestion in the range only after the min thresh.

the same fix could be applied to the suggestions that feed directly into the Trainer. the options would be to either pick a reasonable default and stick with the current api, or to accept Union[bool, float] for auto_lr_find, say, and interpret the float as the min threshold beyond which to consider suggestions.

@mstewart141
Copy link
Author

but wouldn't it also be tricked by spikes anywhere? we're talking about local mins here...

i think that in practice the extreme spikes are a symptom of going from "totally random/untuned" to "ever so slightly tuned" and occur primarily right at the very very beginning.

of course, some models may behave more pathologically, but making a good suggestion for such models is probably out of scope for a simple LR suggester

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open to be worked on let's do it! approved to implement question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants