Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

Merged
merged 4 commits into from
Feb 13, 2024

Conversation

jinwonkim93
Copy link
Contributor

@jinwonkim93 jinwonkim93 commented Feb 8, 2024

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf)

Description

almost identical to consine min lr but it adds constant ratio which freezes lr at some percent of training step.

Motivation and Context

This scheduler has been proven for continual pretraining.

How has this been tested?

Made test code

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Lgtm

README.md Outdated
@@ -797,6 +797,7 @@ early_stopping_patience: 3
lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
lr_scheduler_kwargs:
cosine_min_lr_ratio: # decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of peak lr
cosine_constant_lr_ratio: # freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means start cosine_min_lr at 80% of training step
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a link to the arxiv paper referenced here too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i put paper link to it.

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test failure w

E TypeError: AxolotlTrainingArguments.init() got an unexpected keyword argument 'cosine_constant_lr_ratio'

@jinwonkim93
Copy link
Contributor Author

There is a test failure w

E TypeError: AxolotlTrainingArguments.init() got an unexpected keyword argument 'cosine_constant_lr_ratio'

thank you i add in arguments

@winglian winglian merged commit 8430db2 into axolotl-ai-cloud:main Feb 13, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants