schedulefree optimizers #30079

winglian · 2024-04-06T03:39:21Z

What does this PR do?

integrates meta's https://github.com/facebookresearch/schedule_free for adamw & sgd

https://twitter.com/aaron_defazio/status/1776320004465582331

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr @younesbelkada @pacman100

muellerzr · 2024-04-06T11:36:20Z

FYI this will need huggingface/accelerate#2631 as we need to upstream accelerate's ability to call train/eval on a wrapped optimizer

danielhanchen · 2024-04-07T02:09:23Z

Some thoughts:

I was trying to ask Aaron et al on Twitter if they did any transformer experiments, but to no avail. They said a paper will come in 1 or 2 months.
Aaron et al's past work on D-Adaptation won a best ICML paper, with their follow up work being Prodigy - but both on transformers did similar or worse than AdamW. https://twitter.com/danielhanchen/status/1775547139248341125
Superconvergence + LR range finder + Fast AI's Ranger21 optimizer was the goto optimizer for CNNs, and worked fabulously well, but on transformers, the learning rate range finder sadi 1e-3 was the best, whilst 1e-5 was better. However, the 1 cycle learning rate stuck. Learning rate finder for the trainer #16013
A huge issue is this needs tuning??! But how about a well tuned AdamW? Eg see https://twitter.com/kellerjordan0/status/1776716388037529843 which outperformed it using a tuned SGD.

I'm just a little bit reserved for now since the author themselves aren't providing any transformer benchmarks, nor have they compared their CNN baselines to superconvergence, which is the goto standard for fast training for CNNs. Likewise https://parameterfree.com/2023/08/30/yet-another-icml-award-fiasco/ wasn't pleasant.

PhilipMay · 2024-04-07T06:35:06Z

Should be very easy to test this on Phi-2 or TinyLlama when the implementation works?

younesbelkada

Great work @winglian ! 🤩 I left one minor comment, wdyt?

younesbelkada · 2024-04-08T09:31:50Z

src/transformers/trainer.py

@@ -3117,6 +3145,9 @@ def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor,
            `torch.Tensor`: The tensor with training loss on this batch.
        """
        model.train()
+        if "ScheduleFree" in self.optimizer.__class__.__name__:


maybe instead of checking the class name here we could inject an attribute _hf_schedule_free_optim to make sure we can support that in the future for other shcedule free optimizers, what do you think?

that would be on the Trainer class, right?

so the place that makes the most sense to set that would be in get_optimizer_cls_and_kwargs but that is a @staticmethod so has no access to the trainer object. We could do something along the lines of

setattr(self.optimizer, "_hf_schedule_free_optim", True)

after we instantiate the optimizer_cls but we would still have to do some sort of class name detection.

Alternatively we could pass another value in the return tuple specific to schedule_free optimizers (but that feels worse)

ahh good point yeah, in that case this is probably already fine I would say, thanks for investigating @winglian !

Rather than have it as a stateful attribute, could we instead move this logic out to a module-level function e.g.:

def _is_schedule_free_optimizer(optimizer): return "ScheduleFree" in optimizer__class__.__name__

?

This way:

The check is a bit more explicit within the code logic

we can easily adapt the checking in one place, rather than throughout the code, if we end up introducing e.g. a _is_schedule_free attribute or there's schedule free optimizers with slightly different names

PhilipMay · 2024-04-08T10:15:15Z

This PR should maybe also add a few lines to the README about "how to use this".

muellerzr · 2024-04-08T15:28:54Z

We've merged the accelerate portion in, so if anyone is trying this out in distributed fashions, you can do pip install git+https://github.com/huggingface/accelerate :)

src/transformers/trainer.py

bratao · 2024-04-14T16:54:46Z

There is any chance of this making into the main branch? I and other confirmed that the results are real. Thank you @winglian

pacman100

Super useful addition of scheduler free optimizers @winglian! It would be great to document the usage along with a minimal example.

HuggingFaceDocBuilderDev · 2024-04-29T12:44:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

CoffeeVampir3 · 2024-05-09T05:15:54Z

Is their any remaining work I could contribute towards getting this PR merged?

Cheers

winglian · 2024-05-31T18:55:23Z

@pacman100 @muellerzr @younesbelkada Can we get a new review to get this merged? Since the last check, I rebased, added some fixes and docs.

muellerzr

Thanks! Overall LG2M, let's pin schedulefree as a >= however.

Can you also run the quality checks? Afterwords at least from my end looks good to merge.

setup.py

winglian · 2024-06-01T01:11:06Z

@muellerzr ran the make quality/lint and also added a smoke test to the test suite for schedule free adam

muellerzr

Thanks a bunch! cc @LysandreJik for final review

younesbelkada

Thanks a lot !

amyeroberts

Thanks for adding!

Main comment is about the getattr logic in get_optimizer_cls_and_kwargs

amyeroberts · 2024-06-03T14:28:42Z

src/transformers/trainer.py

@@ -3117,6 +3145,9 @@ def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor,
            `torch.Tensor`: The tensor with training loss on this batch.
        """
        model.train()
+        if "ScheduleFree" in self.optimizer.__class__.__name__:


Rather than have it as a stateful attribute, could we instead move this logic out to a module-level function e.g.:

def _is_schedule_free_optimizer(optimizer): return "ScheduleFree" in optimizer__class__.__name__

?

This way:

The check is a bit more explicit within the code logic

we can easily adapt the checking in one place, rather than throughout the code, if we end up introducing e.g. a _is_schedule_free attribute or there's schedule free optimizers with slightly different names

amyeroberts · 2024-06-03T16:03:12Z

src/transformers/trainer.py

+            additional_optim_kwargs["warmup_steps"] = args.warmup_steps
+            additional_optim_kwargs.update(
+                {
+                    "weight_lr_power": float(getattr(torch, optim_args.get("weight_lr_power", 2.0))),


This doesn't seem right:

If we get "weight_lr_power" from optim_args I'm presuming it's a float as string e.g. "2.0"? I don't think torch.2.0 exists?

If optim_args doesn't have "weight_lr_power", then the second argument to getattr is a float, which isn't compatible

amyeroberts · 2024-06-03T16:03:18Z

src/transformers/trainer.py

+            additional_optim_kwargs.update(
+                {
+                    "weight_lr_power": float(getattr(torch, optim_args.get("weight_lr_power", 2.0))),
+                    "r": float(getattr(torch, optim_args.get("r", 0.0))),


github-actions · 2024-06-28T08:04:57Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

winglian · 2024-06-28T11:21:09Z

Will get back to this soon. Not stale 😅

github-actions · 2024-07-23T08:06:15Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bratao · 2024-07-31T11:52:08Z

@winglian please don´t let it die

winglian mentioned this pull request Apr 6, 2024

add support for adamw schedulefree axolotl-ai-cloud/axolotl#1486

Open

vasqu mentioned this pull request Apr 7, 2024

Schedule Free Optimizer (SGD, AdamW) #30087

Closed

younesbelkada reviewed Apr 8, 2024

View reviewed changes

hiyouga reviewed Apr 9, 2024

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

vasqu mentioned this pull request Apr 20, 2024

Schedule Free Optimizers (pytorch) and Sophia optimizer #30359

Open

pacman100 reviewed Apr 29, 2024

View reviewed changes

schedulefree optimizers

9e654bb

winglian force-pushed the schedule-free-optimizer branch from 87b9651 to 9e654bb Compare May 31, 2024 18:44

winglian added 2 commits May 31, 2024 14:44

fix train instead of eval for optimizer

03d73f2

fixes and update docs

155243b

muellerzr approved these changes May 31, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

winglian added 3 commits May 31, 2024 19:19

chore: lint

aaeff4c

add tests and drop overly-verbose _32bit suffix

096ccca

chore: lint

2f28abc

muellerzr approved these changes Jun 1, 2024

View reviewed changes

muellerzr requested a review from LysandreJik June 1, 2024 02:39

fix for docs

7f5516e

younesbelkada approved these changes Jun 3, 2024

View reviewed changes

younesbelkada requested a review from amyeroberts June 3, 2024 08:04

amyeroberts reviewed Jun 3, 2024

View reviewed changes

github-actions bot closed this Jul 31, 2024

amyeroberts reopened this Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedulefree optimizers #30079

schedulefree optimizers #30079

winglian commented Apr 6, 2024

muellerzr commented Apr 6, 2024 •

edited

Loading

danielhanchen commented Apr 7, 2024

PhilipMay commented Apr 7, 2024

younesbelkada left a comment

younesbelkada Apr 8, 2024

winglian Apr 8, 2024

winglian Apr 9, 2024

younesbelkada Apr 9, 2024

amyeroberts Jun 3, 2024

PhilipMay commented Apr 8, 2024

muellerzr commented Apr 8, 2024

bratao commented Apr 14, 2024

pacman100 left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 29, 2024

CoffeeVampir3 commented May 9, 2024

winglian commented May 31, 2024

muellerzr left a comment

winglian commented Jun 1, 2024

muellerzr left a comment

younesbelkada left a comment

amyeroberts left a comment

amyeroberts Jun 3, 2024

amyeroberts Jun 3, 2024

amyeroberts Jun 3, 2024

github-actions bot commented Jun 28, 2024

winglian commented Jun 28, 2024

github-actions bot commented Jul 23, 2024

bratao commented Jul 31, 2024

schedulefree optimizers #30079

Are you sure you want to change the base?

schedulefree optimizers #30079

Conversation

winglian commented Apr 6, 2024

What does this PR do?

Before submitting

Who can review?

muellerzr commented Apr 6, 2024 • edited Loading

danielhanchen commented Apr 7, 2024

PhilipMay commented Apr 7, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilipMay commented Apr 8, 2024

muellerzr commented Apr 8, 2024

bratao commented Apr 14, 2024

pacman100 left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 29, 2024

CoffeeVampir3 commented May 9, 2024

winglian commented May 31, 2024

muellerzr left a comment

Choose a reason for hiding this comment

winglian commented Jun 1, 2024

muellerzr left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 28, 2024

winglian commented Jun 28, 2024

github-actions bot commented Jul 23, 2024

bratao commented Jul 31, 2024

muellerzr commented Apr 6, 2024 •

edited

Loading

pacman100 left a comment •

edited

Loading