Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warm step #13

Open
ShaniGam opened this issue Dec 5, 2019 · 3 comments
Open

warm step #13

ShaniGam opened this issue Dec 5, 2019 · 3 comments

Comments

@ShaniGam
Copy link

ShaniGam commented Dec 5, 2019

Can you please explain the intuition for using warm_step=200 for only 1 epoch? It doesn't seem like enough for meaningful training without distillation. What happens if I use the distillation loss from scratch?

@twangnh
Copy link
Owner

twangnh commented Dec 18, 2019

can you rephrase your question?

@ShaniGam
Copy link
Author

The warm step is not mentioned in the paper. Does it improve the result?

@twangnh
Copy link
Owner

twangnh commented Dec 18, 2019

no, warm up is not related to distillation, it is used for stable training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants