Tuner class #1998

SkafteNicki · 2020-05-29T14:43:57Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes issues: #1919 and #1983 and #1975 and #1827 and #2101
Based on discussion from: #1975
Build on top of (should be closed): #1834
Will make #1973 (now #2223) obsolete, since this introduces a general solution.

This PR is a refactoring of the learning rate finder and batch size scaler algorithms into a new Tuner class.

Usage

The interface is keep as simple as possible:

from pytorch_lightning import Trainer, HyperTuner
model = ModelClass(...)
trainer = Trainer(...)
tuner = HyperTuner(trainer, auto_scale_batch_size=True, auto_lr_find=True)
tuner.tune(model) # will automatically find lr and scale batch size
trainer.fit(model)

the user can still invoke the individual methods to get more control over the algorithm. This has been standardized such that each method returns a object that can be interacted with (similar to how lr finder does it):

tuner = HyperTuner(trainer)
object = tuner.scale_batch_size(model, ...) or tuner.lr_find(model, ...)
object.results # here results are stored
fig = object.plot() # generate simple plot of the results
new_val = object.suggestion() # get a suggested value
model.learning_rate=new_val or model.batch_size=new_val
trainer.fit(model)

Improvements:

dp, ddp and amp support!
Full backwards compatibility with hparams and full compatibility with the new argument interface
Learning rate finder can now monitor validation loss as requested in Lr_finder based on the validation loss rather than training loss #1975. This will give better learning rate estimates but at the high cost of running the complete validation set after each step.
Batch scaler now returns a object similar to learning rate finder, that have the stored results of the search. The results are a dict with the logged batch size (key batch_size), if the run could fit in memory (key fits_in_memory) and the time it took (key time). Times are standardize by the largest batch that fits in memory, so they are comparable.
The .plot function of the batch scaler, plots batch size vs time and mark the maximum size possible to run.
The .suggestion function of the batch scaler, as default return the largest batch size that fits in memory, but accepts a condition argument that can be set to speed that will pick the batch size that was fastest (often this is the largest batch size that fits in memory, but many times it can be faster to run with a slightly smaller than max size).
Added a ExperimentalWarning class that we can use in the future to mark features as experimental. It should be seen as a warning to the user that the feature may be under-tested and therefore not work in some specific cases and that the interface of the feature may change drastically within short amount of time.
probably something I cannot remember right now...

Note: this change is of cause not backward compatible because it removes support for the trainer arguments auto_scale_batch_size and auto_lr_find.

Tagging @williamFalcon, @Borda and @tullie

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2020-05-29T14:44:05Z

Hello @SkafteNicki! Thanks for updating this PR.

In the file pytorch_lightning/hypertuner/lr_finder.py:

Line 30:1: E302 expected 2 blank lines, found 1

Comment last updated at 2020-07-02 08:40:10 UTC

mergify · 2020-05-29T14:44:39Z

This pull request is now in conflict... :(

tullie · 2020-05-29T17:58:33Z

This is great and definitely a clear improvement from having it all in one class. Few high level things:

I think we should dive into the Tuner dependency on trainer a bit. In Refactor fit/train/run_pretrain_routine/evaluate/test #1195 we discussed some different options, e.g. just having a config for each class, does anyone else from @PyTorchLightning/core-contributors or the community have opinions on which API we should aim for?
Can we rename the tuner class to HyperTuner or similar? There was some confusion in Slack that this was related to fine-tuning.
Do we need to use mixins for the tuner class? I think it'd be easier to maintain if the LRMixin and BatchFinderMixin don't have shared global state.

codecov · 2020-06-30T13:13:26Z

Codecov Report

Merging #1998 into master will decrease coverage by 0%.
The diff coverage is 81%.

@@           Coverage Diff           @@
##           master   #1998    +/-   ##
=======================================
- Coverage      88%     88%    -0%     
=======================================
  Files          69      73     +4     
  Lines        5530    5643   +113     
=======================================
+ Hits         4891    4968    +77     
- Misses        639     675    +36

mergify · 2020-06-30T21:01:46Z

This pull request is now in conflict... :(

mergify · 2020-07-07T16:27:45Z

This pull request is now in conflict... :(

awaelchli · 2020-08-19T15:13:07Z

@tullie @SkafteNicki Have you thought of or discussed this option:

tuner = HyperTuner(trainer, model)
tuner.auto_scale_batch_size(*args for this algorithm)
tuner.auto_lr_find(*args for this algorithm)
...

With the approach where you pass the argument, you cannot control the order in which the tunings are applied. I think with time there will be more tuning algorithms added and the user may want to test different combinations that may impact performance based on the order in which they are applied.

SkafteNicki · 2020-08-20T13:41:12Z

@awaelchli good point. I already though that people should have two options:

Easiest approach, set flags and call tune. This will call individual methods in the "right" order and use best common practice for setting the hyperparameters that are being tuned

tuner = HyperTuner(trainer, model, auto_lr_find=True, auto_scale_batch_size=True, auto_other_...)
tuner.tune()

Advanced approch, initial hypertuner and call individual method however you like (maybe throw warning that some hyperparameters should be tuned before others):

tuner = HyperTuner(trainer, model)
tuner.auto_scale_batch_size()
tuner.auto_scale_lr()
tuner.auto_scale_something_other()

I probably need to start over with this refactoring, because:

we have implemented the DataModule since this PR was started and we also need to support this.
some of the problem have been fixed on master, like hparams incompatibility

@awaelchli is it better to divide this into multiple PRs, one that first setup the base structure of the HyperTuner class and then a PR for each feature copied over? Seems like large PRs are much harder to land.

awaelchli · 2020-08-20T13:54:28Z

@awaelchli is it better to divide this into multiple PRs, one that first setup the base structure of the HyperTuner class and then a PR for each feature copied over? Seems like large PRs are much harder to land.

Yes I totally agree. I think the Tuner class can be developed without touching the tuning functions of the Trainer. We may want to deprecate them first before fully removing it anyway?

SkafteNicki · 2020-08-20T14:49:31Z

@awaelchli yes, you are probably right. I think both the learning rate finder and batch scaler was pretty new features at that point, so if this PR landed much earlier it would only have affected early adopters. However, it seems a lot of people are using them now (even with the restrictions they currently have) so we probably need to go with a slow deprecation.

Okay, I am closing this for now, and will start over with smaller PRs. @awaelchli do you want to be more involved in this process (can I at least ping you when I have a new PR?)

awaelchli · 2020-08-20T19:38:24Z

@SkafteNicki cool! yes please feel free to ping me on whatever. Very excited for this Tuner!

Borda · 2020-08-21T17:19:32Z

@SkafteNicki cool! yes please feel free to ping me on whatever. Very excited for this Tuner!

me too :]

justusschock · 2020-08-21T17:21:03Z

and me as well :D

Nicki Skafte added 19 commits May 25, 2020 16:11

file layout

91d9d89

test folder

05955b6

moved tests

41e0ed3

deprecated arguments in trainer

d7b7aea

removed from fit

af5e60f

move code from trainer to tuner

0ca4dd0

update arg description

4c3b551

implement dependency

b00124a

trying to unify interface

e5dc0d4

more updates

5da499f

trying to get docs working

54becb1

many changes, same structure for functions

0394051

Merge remote-tracking branch 'upstream/master' into tuner_class

b9b2f8e

drop 1.1 and 1.2 backward compatibility

27eaab2

update

3ab5e3c

working batch scaler

e0788b8

test for call order

a8698e0

working lr finder

7a72eea

working tests

4b49dbb

mergify bot requested a review from a team May 29, 2020 14:44

Borda marked this pull request as draft May 29, 2020 14:55

Borda added the Important label Jun 1, 2020

Borda added this to the 0.8.0 milestone Jun 1, 2020

Nicki Skafte added 2 commits June 2, 2020 17:08

merge upstream

5186da5

fix styling

e6d0255

Borda removed this from the 0.8.0 milestone Jun 8, 2020

Nicki Skafte added 3 commits June 30, 2020 14:17

fix problems after merge

32e1ca6

add tests for dp/ddp

420a116

fix code format

48b15c4

Nicki Skafte added 2 commits June 30, 2020 15:13

fix code format

7716b92

lr finder working with ddp_cpu

53452af

awaelchli mentioned this pull request Jun 30, 2020

Train in run_pretrain_routine? #2431

Closed

Nicki Skafte added 5 commits July 2, 2020 09:18

merge

866db87

add test for amp support

489d1d8

fix amp tests

f58d820

fix amp tests

d0006d2

smaller fixes

cc9abd7

rohitgr7 mentioned this pull request Jul 4, 2020

Initialising model in setup not compatible with auto_scale_batch_size / auto_lr_find #2485

Closed

SkafteNicki mentioned this pull request Jul 6, 2020

Fix bugs in scale_batch_size #2523

Closed

awaelchli modified the milestones: 0.9.0, 0.9.x Aug 8, 2020

SkafteNicki closed this Aug 20, 2020

Borda modified the milestones: 0.9.x, 0.9.0 Aug 20, 2020

SkafteNicki mentioned this pull request Aug 25, 2020

[WIP] Hypertuner class #3160

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuner class #1998

Tuner class #1998

SkafteNicki commented May 29, 2020 •

edited

Loading

pep8speaks commented May 29, 2020 •

edited

Loading

mergify bot commented May 29, 2020

tullie commented May 29, 2020

codecov bot commented Jun 30, 2020 •

edited

Loading

mergify bot commented Jun 30, 2020

mergify bot commented Jul 7, 2020

awaelchli commented Aug 19, 2020 •

edited

Loading

SkafteNicki commented Aug 20, 2020

awaelchli commented Aug 20, 2020

SkafteNicki commented Aug 20, 2020

awaelchli commented Aug 20, 2020

Borda commented Aug 21, 2020

justusschock commented Aug 21, 2020

Tuner class #1998

Tuner class #1998

Conversation

SkafteNicki commented May 29, 2020 • edited Loading

Before submitting

What does this PR do?

Usage

Improvements:

PR review

Did you have fun?

pep8speaks commented May 29, 2020 • edited Loading

Comment last updated at 2020-07-02 08:40:10 UTC

mergify bot commented May 29, 2020

tullie commented May 29, 2020

codecov bot commented Jun 30, 2020 • edited Loading

Codecov Report

mergify bot commented Jun 30, 2020

mergify bot commented Jul 7, 2020

awaelchli commented Aug 19, 2020 • edited Loading

SkafteNicki commented Aug 20, 2020

awaelchli commented Aug 20, 2020

SkafteNicki commented Aug 20, 2020

awaelchli commented Aug 20, 2020

Borda commented Aug 21, 2020

justusschock commented Aug 21, 2020

SkafteNicki commented May 29, 2020 •

edited

Loading

pep8speaks commented May 29, 2020 •

edited

Loading

codecov bot commented Jun 30, 2020 •

edited

Loading

awaelchli commented Aug 19, 2020 •

edited

Loading