Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

jinwonkim93 · 2024-02-08T07:01:14Z

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (https://arxiv.org/pdf/2308.04014.pdf)

Description

almost identical to consine min lr but it adds constant ratio which freezes lr at some percent of training step.

Motivation and Context

This scheduler has been proven for continual pretraining.

How has this been tested?

Made test code

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

winglian

Thanks! Lgtm

winglian · 2024-02-09T15:44:09Z

README.md

@@ -797,6 +797,7 @@ early_stopping_patience: 3
 lr_scheduler: # 'one_cycle' | 'log_sweep' | empty for cosine
 lr_scheduler_kwargs:
 cosine_min_lr_ratio: # decay lr to some percentage of the peak lr, e.g. cosine_min_lr_ratio=0.1 for 10% of peak lr
+cosine_constant_lr_ratio: # freeze lr at some percentage of the step, e.g. cosine_constant_lr_ratio=0.8 means start cosine_min_lr at 80% of training step


Would be nice to have a link to the arxiv paper referenced here too

i put paper link to it.

winglian

There is a test failure w

E TypeError: AxolotlTrainingArguments.init() got an unexpected keyword argument 'cosine_constant_lr_ratio'

jinwonkim93 · 2024-02-12T10:56:35Z

There is a test failure w

E TypeError: AxolotlTrainingArguments.init() got an unexpected keyword argument 'cosine_constant_lr_ratio'

thank you i add in arguments

Add constant lr scheduler

22091ab

winglian reviewed Feb 9, 2024

View reviewed changes

Add reference to Readme

b6a663e

winglian reviewed Feb 12, 2024

View reviewed changes

Add arguments

4e86b3d

winglian approved these changes Feb 12, 2024

View reviewed changes

Merge branch 'main' into constant_scheduler

bd6028f

winglian merged commit 8430db2 into axolotl-ai-cloud:main Feb 13, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

jinwonkim93 commented Feb 8, 2024 •

edited

Loading

winglian left a comment

winglian Feb 9, 2024

jinwonkim93 Feb 12, 2024

winglian left a comment

jinwonkim93 commented Feb 12, 2024

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? #1273

Conversation

jinwonkim93 commented Feb 8, 2024 • edited Loading

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

winglian left a comment

Choose a reason for hiding this comment

winglian Feb 9, 2024

Choose a reason for hiding this comment

jinwonkim93 Feb 12, 2024

Choose a reason for hiding this comment

winglian left a comment

Choose a reason for hiding this comment

jinwonkim93 commented Feb 12, 2024

jinwonkim93 commented Feb 8, 2024 •

edited

Loading