Learning rate finder crashes if accumulate_grad_batches is not set to 1 #1726

RafailFridman · 2020-05-04T08:57:20Z

I'm not sure if it is expected behavior or a bug, but when I'm trying to find a learning rate like this:

trainer = pl.Trainer(gpus = [1], accumulate_grad_batches=8)
lr_finder = trainer2.lr_find(model,min_lr = 1e-8, max_lr = 1e-1, num_training = 300)

It throws an error AttributeError: 'NoneType' object has no attribute 'item', which happens on the line 335 of lr_finder.py : current_loss = trainer.running_loss.last().item()

When I remove accumulate_grad_batches=8 everything works as expected
If it is expected behavior, I suggest implementing a more expressive error message

The text was updated successfully, but these errors were encountered:

SkafteNicki · 2020-05-04T10:33:13Z

Just to be sure, is it an typing error that the trainer that gets initialized is called trainer and the trainer that gets used with learning rate finder is called trainer2, or is it two different trainers?

RafailFridman · 2020-05-04T10:37:53Z

@SkafteNicki yeah, sorry, I just tried different trainers and copied the wrong one.
Can you please check on your side if this error exists?

SkafteNicki · 2020-05-04T10:52:13Z

This is very strange because the accumulate_grad_batches variable are override by the learning rate finders own argument num_accumulation_steps while it is running. I will look into whats coursing this error.

Just to be sure, do you want to accumulate gradients during the learning rate finder or is it just for later fitting?

RafailFridman · 2020-05-04T15:44:51Z

I want to accumulate batches in training, so I suppose I should set accumulate_grad_batches parameter as in the training phase. Do I understand this wrong?

SkafteNicki · 2020-05-04T19:32:11Z

No, nothing wrong with your understanding of the code. I have found a solution to the problem and will create a PR soon.

florisdf · 2020-05-11T10:18:50Z

I'm having the same error. Any solutions ready to be pulled in?

jopo666 · 2020-05-11T10:58:22Z

Just use the num_accumulation_steps option used by the learning rate finder for now.

trainer = pl.Trainer(gpus=1, accumulate_grad_batches=1)
lr_finder = trainer.lr_find(model, num_accumulation_steps=8)

[solution doesn't work]

alexstoken · 2020-05-11T20:57:48Z

@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the lr_find experiment. The global_step of the trainer, which only iterates when the learning rate is updated, runs every batch during the lr_find experiment, regardless of the num_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.

Tested on a nightly from last week.

RafailFridman added the question Further information is requested label May 4, 2020

RafailFridman changed the title ~~Learning rate finder crashes if accumulate_grad_batches is not set to 1~~ Learning rate finder crashes if accumulate_grad_batches is not set to 1 May 4, 2020

SkafteNicki mentioned this issue May 12, 2020

Bugfix: accumulation and suggestion for learning rate finder #1801

Merged

5 tasks

williamFalcon closed this as completed in #1801 May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate finder crashes if accumulate_grad_batches is not set to 1 #1726

Learning rate finder crashes if accumulate_grad_batches is not set to 1 #1726

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

florisdf commented May 11, 2020

jopo666 commented May 11, 2020 •

edited

Loading

alexstoken commented May 11, 2020

Learning rate finder crashes if accumulate_grad_batches is not set to 1 #1726

Learning rate finder crashes if accumulate_grad_batches is not set to 1 #1726

Comments

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

RafailFridman commented May 4, 2020

SkafteNicki commented May 4, 2020

florisdf commented May 11, 2020

jopo666 commented May 11, 2020 • edited Loading

alexstoken commented May 11, 2020

jopo666 commented May 11, 2020 •

edited

Loading