Learning Rate finder #1347

SkafteNicki · 2020-04-02T15:54:08Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #624
This PR implements a new method to the trainer class lr_finder=Trainer.find_lr(model), that are similar to the feature found in fast.ai. It does a small fit of the model, where the lr is increased after each batch and the corresponding loss is logged. The output object lr_finder can then be used to investigate the connection between choice of lr and the loss of the model. It can be used to reduce the amount of guesswork of choosing a good lr and it can be used to choose good bounds for the CyclicLRScheduler.

The interface is simple from a user-standpoint:

model = MyModelClass(hparams)
trainer = pl.Trainer()
lr_finder = trainer.find_lr(model)
# Plot results
lr_finder.plot(suggest=True)
# Choose based on plot, or get a suggestion
model.hparams.lr = lr_finder.suggestion()
# Fit
trainer.fit(model)

Running above code for pl_examples/basic_model/cpu_templatlightning_module_template.py model produces the following plot (red point corresponds to the suggested lr to use)

The feature seemed to gain much traction when it was proposed, however lightning was at that time missing a step-wise scheduling feature. This was implemented in PR #941, and this feature was therefore possible to implement now using more or less standard lightning features (callbacks ect.)

This PR is currently missing a lot (documentation, tests ect.) but I wanted I bit of feedback if this is still a wanted feature in lightning or if instead should be a part of lightning-bolts (when that is up and running).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Borda

Very excited about this feature! 🤖

pytorch_lightning/trainer/lr_finder.py

Borda · 2020-04-02T23:28:27Z

pytorch_lightning/trainer/lr_finder.py

+        self.num_iter = num_iter
+        super(_LinearLR, self).__init__(optimizer, last_epoch)
+
+    def get_lr(self):


having it as property lr?

pytorch_lightning/trainer/lr_finder.py

Borda · 2020-04-02T23:35:34Z

pytorch_lightning/trainer/lr_finder.py

+        # Max step set to number of iterations
+        max_steps = self.max_steps
+        self.max_steps = num_iters
+
+        # Disable standard progress bar for fit
+        show_progress_bar = self.show_progress_bar
+        self.show_progress_bar = False
+
+        # Accumulation of gradients
+        accumulate_grad_batches = self.accumulate_grad_batches
+        self.accumulate_grad_batches = num_accumulation_steps
+
+        # Configure optimizer and scheduler
+        optimizer, _, _ = self.init_optimizers(model.configure_optimizers())
+        assert len(optimizer) == 1, 'cannot find lr for more than 1 optimizer'
+        configure_optimizers = model.configure_optimizers
+        model.configure_optimizers = lr_finder._get_new_optimizer(optimizer[0])
+
+        # Fit, lr & loss logged in callback
+        self.fit(model)
+
+        # Promt if we stopped early
+        if self.global_step != num_iters:
+            print('LR finder stopped early due to diverging loss.')
+
+        # Transfer results from callback to lr finder object
+        lr_finder.results.update({'lr': self.callbacks[0].lrs,
+                                  'loss': self.callbacks[0].losses})


regarding the logger suppression, it would be nice to have this as a function/method with a wrapper...

Can you explain a bit more?

there is a single block of code which you want to execute "silently"
co make it as a separate function and write a wrapper which disables the logger and later restore

williamFalcon

This is awesome! I'd prefer to put this into an argument in the trainer... not have an additional method the user has to call.

Trainer(auto_find_lr=True)

def auto_find_lr(self, model):

  lr_finder = self.find_lr(model)
  # Plot results

  # Choose based on plot, or get a suggestion
  print(f'suggested lr {lr_finder.suggestion()}')

  # automatically update the optimizers

@justusschock @ethanwharris thoughts?

Some caveats:

This won't work with dp/ddp. We should modify for that case.
what about multiple optimizers?

I recommend not merging until we hash out the API.
Ok to merge a v1 of this that doesn't support multiple optimizers or dp/ddp but we need to work those out eventually. This should also be clearly written in the docs

SkafteNicki · 2020-04-03T14:32:18Z

@williamFalcon can you explain why this wont work with dp or ddp? I though that if the method internally calls Trainer.fit() to do the actual work, then it should work out of the box. I have no experience with dp or ddp so I would need help to get this feature to support this.
I don't think that other frameworks support this feature for more than 1 optimizer. However, I guess that it can be done using a simple grid search. The search would take num_optimizers * num_iters steps, so it would in most cases take <1000 steps to do this.

williamFalcon · 2020-04-03T14:34:59Z

@SkafteNicki if you call .fit internally then it should be fine!

Overall though, take a look at my comments about collapsing this into a flag

SkafteNicki · 2020-04-03T14:53:09Z

The idea of having this as a separate method is just taken directly from fastai. I am fine by collapsing this into a flag, however I think it removes the possibility for the user to interact with the results produced by learning rate finder before fitting the model.

justusschock · 2020-04-03T14:58:32Z

@williamFalcon I wouldn't migrate it to a trainer arg.

What I'd really like is something like:

# init your Trainer:
trainer = Trainer (...)
with trainer.find_best_lr:
    trainer.fit

This is not much boilerplate, but I think we should not make to much implicit choices.

And if you would pass the optimiser there, you could also do:

with trainer.find_best_lr(optim1):
    with trainer.find_best_lr(optim2):
        trainer.fit()

Which would be equal to:

with trainer.find_best_lr(optim1, optim2):
    trainer.fit()

Not sure, how realistic this is, but that's the API, I'd like the most...

Borda · 2020-04-03T16:39:11Z

I recommend not merging until we hash out the API.
Ok to merge a v1 of this that doesn't support multiple optimizers or dp/ddp but we need to work those out eventually. This should also be clearly written in the docs

I would then rather target the next release and have it in v0.8.0 with metrics

mergify · 2020-04-03T19:03:08Z

This pull request is now in conflict... :(

jeremyjordan

great contribution! excited about this feature, just have a few comments to address.

docs/source/lr_finder.rst

pytorch_lightning/trainer/lr_finder.py

jeremyjordan · 2020-04-04T03:30:43Z

yeah i agree with @justusschock i wouldn't use a Trainer arg here. i personally like the recommended usage as it stands.

(copied from @SkafteNicki's post above )

model = MyModelClass(hparams)
trainer = pl.Trainer()
lr_finder = trainer.find_lr(model)

# Plot results
lr_finder.plot(suggest=True)

# Choose based on plot, or get a suggestion
model.hparams.lr = lr_finder.suggestion()

# Fit
trainer.fit(model)

users can name their learning rate hparam however they please (lr, learning_rate, etc.) so it would be difficult to automatically set this value. sure, we can update the actual lr for the optimizer but that wouldn't be reflected in the hparams that are logged.

awaelchli

This is an amazing feature!
I have some suggestions for the docs :)

docs/source/lr_finder.rst

pytorch_lightning/trainer/lr_finder.py

lkhphuc · 2020-04-04T11:04:15Z

Great PR, I'm also looking for this feature. Thanks everyone.

yeah i agree with @justusschock i wouldn't use a Trainer arg here. i personally like the recommended usage as it stands.

I think the current usage is great to manipulate lr finder programmatically, but it would also be nicer if we have a flag for interactive training. Something like this:

$ python train.py --find_lr=True
[INFO] ....
[INFO] (Plot learning rate finder, inplace for notebook, pop up for terminal)
The suggested learning rate is 3e-4, press enter to accept or input a different value:  >>ENTER<<
[INFO] Learning rate is 3e-4.
[INFO] ....

or

$ python train.py --find_lr=True
[INFO] ....
[INFO] (Plot learning rate finder, inplace for notebook, pop up for terminal)
The suggested learning rate is 3e-4, press enter to accept or input a different value:  3e-5 >>ENTER<<
[INFO] Learning rate is 3e-5.
[INFO] ....

This flag will take precedence over all other lr related flag and the default will of course be False.

williamFalcon · 2020-04-04T13:34:29Z

good ideas. i think we don’t want to set a trend for breaking out of the current API patterns. the trainer flags are there so that users don’t have to think about all the tiny nuances of doing something.

the approach auggested by @justusschock means the user now has to learn more API and has to remember to do a bunch of things... ie: they need to go read docs, also start pulling out optimizers, this completely breaks the lightning organization and abstraction and doesn’t go with the principle of not having to make users think about things they don’t need to think about.

the user just wants the best LR, they shouldn’t have to remember to add a with, or pull out optimizers, etc... they should set a flag and get the LR.

with the trainer flag the user doesn’t have to think about it. in fact, the library can just automatically set the best LR using what it finds... ie: it just works. we can still support the graph approach in this case by showing the plot in the logger. then the user can decide to fix the LR once they feel confident.

the approach i suggest with a flag works as follows:

set the flag.
the LR is found automatically.
we print a nice message with the LR.
the LR is set automatically and training continues
the curve is also logged to the logger (if there is one).
if the user wants to inspect the log and pick a LR manually, they can do that.
at this point the user would just likely set the LR manually going forward from what printed or what is shown in the plot.

this approach has the advantages that:

the user does not have to remember overhead of how to do this... this is a CORE value of lightning. if we lose sight of this we end up with another framework where you build up a lot of cognitive overhead to remember how to do things. this is an engineering decision which should be automated.
we still get the plot which the user can interact with.
the user doesn’t have to comment or delete code which they would have to do with the other approaches suggested... this will clutter code really quickly.

So, i’m going to strongly suggest we use a trainer flag instead.

justusschock · 2020-04-04T14:30:35Z

Okay, gut can we pass an object where the best of should be stored (e.g. hparams.lr) ? If you have multiple runs you probably don't want to do the LR search every time

williamFalcon · 2020-04-04T14:35:45Z

I guess i assume the flow would be:

enable flag
it prints the best lr
follow on runs, disable flag and manually fix the LR going forward

what would that object do?

jeremyjordan · 2020-04-04T15:00:37Z

the library can just automatically set the best LR using what it finds... ie: it just works

let me elaborate on why i think this is challenging

suppose user A has a model like:

class LitModel(pl.LightningModule):

    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        return {'loss': F.cross_entropy(y_hat, y)}

    def train_dataloader(self):
        return DataLoader(MNIST(os.getcwd(), train=True, download=True,
                          transform=transforms.ToTensor()), batch_size=32)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams.lr)

and user B has a model like:

class LitModel(pl.LightningModule):

    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self.forward(x)
        return {'loss': F.cross_entropy(y_hat, y)}

    def train_dataloader(self):
        return DataLoader(MNIST(os.getcwd(), train=True, download=True,
                          transform=transforms.ToTensor()), batch_size=32)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams.learning_rate)

how do we automatically set the learning rate?

sure, we can update the pytorch optimizer'slr but i don't see how we could reliably update the model.hparams object to include the new value automatically. this is important because we're logging the hparams so that users can reproduce their results from previous experiments.

i would agree with the general flow:

enable learning rate feature
report the best learning rate
user sets this hyperparameter and does a full training run

i'm still not sure a flag is the best design for this. by keeping this functionality as a method, we're still only asking the user to call essentially one line of code trainer.find_lr(model). this is as simple as calling trainer.fit(model) for actual training. thus, the cognitive burden on the user is equivalent (remembering a flag name vs method name) and typing completion in editors like VS code will help in both cases.

minimal example

# find learning rate
model = MyModelClass(hparams)
trainer = pl.Trainer()
trainer.find_lr(model)

# do training run
hparams = {**hparms, 'lr': lr_finder.suggestion()}
model = MyModelClass(hparams)
trainer.fit(model)

or the user can reach into the existing model and update an hparam since we call configure_optimizers at the beginning of a trainer.fit()

# find learning rate
model = MyModelClass(hparams)
trainer = pl.Trainer()
trainer.find_lr(model)

# do training run
model.hparams.lr = lr_finder.suggestion()
trainer.fit(model)

power user

# find learning rate
model = MyModelClass(hparams)
trainer = pl.Trainer()
lr_finder = trainer.find_lr(model)       # user saves the returned results
lr_finder.plot(suggest=True)             # and can further inspect if they desire

# do training run
model.hparams.lr = lr_finder.suggestion()
trainer.fit(model)

jeremyjordan

great work on this! ⚡

pytorch_lightning/trainer/trainer.py

williamFalcon · 2020-04-10T16:01:50Z

@SkafteNicki this is an awesome feature!

mergify · 2020-04-10T16:04:36Z

This pull request is now in conflict... :(

pytorch_lightning/trainer/lr_finder.py

Borda · 2020-04-10T16:22:55Z

pytorch_lightning/trainer/lr_finder.py

+
+        lr_max: lr to stop seach
+
+        num_training: number of steps to take between lr_min and lr_max


rather num_train_steps

Borda · 2020-04-10T16:40:38Z

pytorch_lightning/trainer/lr_finder.py

+        self.num_iter = num_iter
+        super(_ExponentialLR, self).__init__(optimizer, last_epoch)
+
+    def get_lr(self):


it shall be described what is the doff between this get_lr and just lr bellow because intuitively (by the name) they shall return the same

get_lr() is the method called inside lr_scheduler.step() and is not meant to be called elsewhere. Since pytorch 1.4 the property self._last_lr was introduced to extract the last computed lr. However, since pytorch-ligning need to be backwards compatible, I created the self.lr property that archives the same. They therfore have slightly different purpose.

* initial structure * rebase * incorporate suggestions * update CHANGELOG.md * initial docs * fixes based on reviews * added trainer arg * update docs * added saving/restore of model state * initial tests * fix styling * added more tests * fix docs, backward compatility and progressbar * fix styling * docs update * updates based on review * changed saving to standard functions * consistent naming * fix formatting * improve docs, added support for nested fields, improve codecov * update CHANGELOG.md * Update lr_finder.rst * Update pytorch_lightning/trainer/trainer.py * Update trainer.py * Update CHANGELOG.md * Update path * restoring * test * attribs * docs * doc typo Co-authored-by: Nicki Skafte <nugginea@gmail.com> Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>

Nicki Skafte added 3 commits March 31, 2020 18:22

initial structure

02791fc

new_trainer_mixing

49e7645

rebase

ba1d7ba

mergify bot requested a review from a team April 2, 2020 15:54

Borda added the feature Is an improvement or enhancement label Apr 2, 2020

Borda requested changes Apr 2, 2020

View reviewed changes

Borda changed the title ~~Lr finder~~ Learning Rate finder Apr 2, 2020

Borda added this to the 0.7.2 milestone Apr 2, 2020

Nicki Skafte added 2 commits April 3, 2020 14:40

incorporate suggestions

096e979

update CHANGELOG.md

a4e25d6

williamFalcon requested changes Apr 3, 2020

View reviewed changes

initial docs

d2bf01b

Merge remote-tracking branch 'upstream/master' into lr_finder

7aa2465

williamFalcon modified the milestones: 0.7.2, 0.7.3 Apr 3, 2020

jeremyjordan suggested changes Apr 4, 2020

View reviewed changes

awaelchli reviewed Apr 4, 2020

View reviewed changes

Borda added the discussion In a discussion stage label Apr 4, 2020

mergify bot requested a review from a team April 9, 2020 16:44

jeremyjordan approved these changes Apr 10, 2020

View reviewed changes

mergify bot requested a review from a team April 10, 2020 02:22

Nicki Skafte and others added 3 commits April 10, 2020 10:27

Merge remote-tracking branch 'upstream/master' into lr_finder

07236c6

update CHANGELOG.md

c30f714

Update lr_finder.rst

bf8c7c5

williamFalcon reviewed Apr 10, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

Update pytorch_lightning/trainer/trainer.py

744bee8

mergify bot requested a review from a team April 10, 2020 15:59

Update trainer.py

df2092f

williamFalcon approved these changes Apr 10, 2020

View reviewed changes

williamFalcon and others added 2 commits April 10, 2020 12:13

Merge branch 'master' into lr_finder

23767c4

Update CHANGELOG.md

c58fd00

Borda reviewed Apr 10, 2020

View reviewed changes

pytorch_lightning/trainer/lr_finder.py Outdated Show resolved Hide resolved

Borda reviewed Apr 10, 2020

View reviewed changes

Borda and others added 3 commits April 10, 2020 18:23

Update path

be0d4f3

restoring

43e4843

test

d085e1a

Borda approved these changes Apr 10, 2020

View reviewed changes

Borda added 3 commits April 10, 2020 18:50

attribs

200a35f

docs

692b099

doc typo

e68fd97

williamFalcon merged commit 3f09b32 into Lightning-AI:master Apr 10, 2020

SkafteNicki deleted the lr_finder branch April 21, 2020 13:51

Borda modified the milestones: 0.7.4, v0.7.x Apr 18, 2021

karl-richter mentioned this pull request Oct 22, 2022

Lightning Migration ourownstory/neural_prophet#837

Merged

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Rate finder #1347

Learning Rate finder #1347

SkafteNicki commented Apr 2, 2020 •

edited

Loading

Borda left a comment

Borda Apr 2, 2020

Borda Apr 2, 2020

SkafteNicki Apr 3, 2020

Borda Apr 3, 2020

williamFalcon left a comment •

edited

Loading

SkafteNicki commented Apr 3, 2020

williamFalcon commented Apr 3, 2020

SkafteNicki commented Apr 3, 2020

justusschock commented Apr 3, 2020 •

edited

Loading

Borda commented Apr 3, 2020

mergify bot commented Apr 3, 2020

jeremyjordan left a comment

jeremyjordan commented Apr 4, 2020

awaelchli left a comment •

edited

Loading

lkhphuc commented Apr 4, 2020 •

edited

Loading

williamFalcon commented Apr 4, 2020 •

edited

Loading

justusschock commented Apr 4, 2020

williamFalcon commented Apr 4, 2020 •

edited

Loading

jeremyjordan commented Apr 4, 2020

jeremyjordan left a comment

williamFalcon commented Apr 10, 2020

mergify bot commented Apr 10, 2020

Borda Apr 10, 2020

Borda Apr 10, 2020

SkafteNicki Apr 10, 2020


		lr_max: lr to stop seach

		num_training: number of steps to take between lr_min and lr_max

Learning Rate finder #1347

Learning Rate finder #1347

Conversation

SkafteNicki commented Apr 2, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

Borda left a comment

Choose a reason for hiding this comment

Borda Apr 2, 2020

Choose a reason for hiding this comment

Borda Apr 2, 2020

Choose a reason for hiding this comment

SkafteNicki Apr 3, 2020

Choose a reason for hiding this comment

Borda Apr 3, 2020

Choose a reason for hiding this comment

williamFalcon left a comment • edited Loading

Choose a reason for hiding this comment

SkafteNicki commented Apr 3, 2020

williamFalcon commented Apr 3, 2020

SkafteNicki commented Apr 3, 2020

justusschock commented Apr 3, 2020 • edited Loading

Borda commented Apr 3, 2020

mergify bot commented Apr 3, 2020

jeremyjordan left a comment

Choose a reason for hiding this comment

jeremyjordan commented Apr 4, 2020

awaelchli left a comment • edited Loading

Choose a reason for hiding this comment

lkhphuc commented Apr 4, 2020 • edited Loading

williamFalcon commented Apr 4, 2020 • edited Loading

justusschock commented Apr 4, 2020

williamFalcon commented Apr 4, 2020 • edited Loading

jeremyjordan commented Apr 4, 2020

jeremyjordan left a comment

Choose a reason for hiding this comment

williamFalcon commented Apr 10, 2020

mergify bot commented Apr 10, 2020

Borda Apr 10, 2020

Choose a reason for hiding this comment

Borda Apr 10, 2020

Choose a reason for hiding this comment

SkafteNicki Apr 10, 2020

Choose a reason for hiding this comment

SkafteNicki commented Apr 2, 2020 •

edited

Loading

williamFalcon left a comment •

edited

Loading

justusschock commented Apr 3, 2020 •

edited

Loading

awaelchli left a comment •

edited

Loading

lkhphuc commented Apr 4, 2020 •

edited

Loading

williamFalcon commented Apr 4, 2020 •

edited

Loading

williamFalcon commented Apr 4, 2020 •

edited

Loading