Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuner class #1998

Closed
wants to merge 42 commits into from
Closed

Tuner class #1998

wants to merge 42 commits into from

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented May 29, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes issues: #1919 and #1983 and #1975 and #1827 and #2101
Based on discussion from: #1975
Build on top of (should be closed): #1834
Will make #1973 (now #2223) obsolete, since this introduces a general solution.

This PR is a refactoring of the learning rate finder and batch size scaler algorithms into a new Tuner class.

Usage

The interface is keep as simple as possible:

from pytorch_lightning import Trainer, HyperTuner
model = ModelClass(...)
trainer = Trainer(...)
tuner = HyperTuner(trainer, auto_scale_batch_size=True, auto_lr_find=True)
tuner.tune(model) # will automatically find lr and scale batch size
trainer.fit(model)

the user can still invoke the individual methods to get more control over the algorithm. This has been standardized such that each method returns a object that can be interacted with (similar to how lr finder does it):

tuner = HyperTuner(trainer)
object = tuner.scale_batch_size(model, ...) or tuner.lr_find(model, ...)
object.results # here results are stored
fig = object.plot() # generate simple plot of the results
new_val = object.suggestion() # get a suggested value
model.learning_rate=new_val or model.batch_size=new_val
trainer.fit(model)

Improvements:

  • dp, ddp and amp support!

  • Full backwards compatibility with hparams and full compatibility with the new argument interface

  • Learning rate finder can now monitor validation loss as requested in Lr_finder based on the validation loss rather than training loss #1975. This will give better learning rate estimates but at the high cost of running the complete validation set after each step.

  • Batch scaler now returns a object similar to learning rate finder, that have the stored results of the search. The results are a dict with the logged batch size (key batch_size), if the run could fit in memory (key fits_in_memory) and the time it took (key time). Times are standardize by the largest batch that fits in memory, so they are comparable.

  • The .plot function of the batch scaler, plots batch size vs time and mark the maximum size possible to run.

  • The .suggestion function of the batch scaler, as default return the largest batch size that fits in memory, but accepts a condition argument that can be set to speed that will pick the batch size that was fastest (often this is the largest batch size that fits in memory, but many times it can be faster to run with a slightly smaller than max size).

  • Added a ExperimentalWarning class that we can use in the future to mark features as experimental. It should be seen as a warning to the user that the feature may be under-tested and therefore not work in some specific cases and that the interface of the feature may change drastically within short amount of time.

  • probably something I cannot remember right now...

Note: this change is of cause not backward compatible because it removes support for the trainer arguments auto_scale_batch_size and auto_lr_find.

Tagging @williamFalcon, @Borda and @tullie

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented May 29, 2020

Hello @SkafteNicki! Thanks for updating this PR.

Line 30:1: E302 expected 2 blank lines, found 1

Comment last updated at 2020-07-02 08:40:10 UTC

@mergify
Copy link
Contributor

mergify bot commented May 29, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team May 29, 2020 14:44
@Borda Borda marked this pull request as draft May 29, 2020 14:55
@tullie
Copy link
Contributor

tullie commented May 29, 2020

This is great and definitely a clear improvement from having it all in one class. Few high level things:

  1. I think we should dive into the Tuner dependency on trainer a bit. In Refactor fit/train/run_pretrain_routine/evaluate/test #1195 we discussed some different options, e.g. just having a config for each class, does anyone else from @PyTorchLightning/core-contributors or the community have opinions on which API we should aim for?
  2. Can we rename the tuner class to HyperTuner or similar? There was some confusion in Slack that this was related to fine-tuning.
  3. Do we need to use mixins for the tuner class? I think it'd be easier to maintain if the LRMixin and BatchFinderMixin don't have shared global state.

@Borda Borda added the Important label Jun 1, 2020
@Borda Borda added this to the 0.8.0 milestone Jun 1, 2020
@Borda Borda removed this from the 0.8.0 milestone Jun 8, 2020
@codecov
Copy link

codecov bot commented Jun 30, 2020

Codecov Report

Merging #1998 into master will decrease coverage by 0%.
The diff coverage is 81%.

@@           Coverage Diff           @@
##           master   #1998    +/-   ##
=======================================
- Coverage      88%     88%    -0%     
=======================================
  Files          69      73     +4     
  Lines        5530    5643   +113     
=======================================
+ Hits         4891    4968    +77     
- Misses        639     675    +36     

@mergify
Copy link
Contributor

mergify bot commented Jun 30, 2020

This pull request is now in conflict... :(

@mergify
Copy link
Contributor

mergify bot commented Jul 7, 2020

This pull request is now in conflict... :(

@awaelchli awaelchli modified the milestones: 0.9.0, 0.9.x Aug 8, 2020
@awaelchli
Copy link
Contributor

awaelchli commented Aug 19, 2020

@tullie @SkafteNicki Have you thought of or discussed this option:

tuner = HyperTuner(trainer, model)
tuner.auto_scale_batch_size(*args for this algorithm)
tuner.auto_lr_find(*args for this algorithm)
... 

With the approach where you pass the argument, you cannot control the order in which the tunings are applied. I think with time there will be more tuning algorithms added and the user may want to test different combinations that may impact performance based on the order in which they are applied.

@SkafteNicki
Copy link
Member Author

@awaelchli good point. I already though that people should have two options:

  1. Easiest approach, set flags and call tune. This will call individual methods in the "right" order and use best common practice for setting the hyperparameters that are being tuned
tuner = HyperTuner(trainer, model, auto_lr_find=True, auto_scale_batch_size=True, auto_other_...)
tuner.tune()
  1. Advanced approch, initial hypertuner and call individual method however you like (maybe throw warning that some hyperparameters should be tuned before others):
tuner = HyperTuner(trainer, model)
tuner.auto_scale_batch_size()
tuner.auto_scale_lr()
tuner.auto_scale_something_other()

I probably need to start over with this refactoring, because:

  1. we have implemented the DataModule since this PR was started and we also need to support this.
  2. some of the problem have been fixed on master, like hparams incompatibility

@awaelchli is it better to divide this into multiple PRs, one that first setup the base structure of the HyperTuner class and then a PR for each feature copied over? Seems like large PRs are much harder to land.

@awaelchli
Copy link
Contributor

@awaelchli is it better to divide this into multiple PRs, one that first setup the base structure of the HyperTuner class and then a PR for each feature copied over? Seems like large PRs are much harder to land.

Yes I totally agree. I think the Tuner class can be developed without touching the tuning functions of the Trainer. We may want to deprecate them first before fully removing it anyway?

@SkafteNicki
Copy link
Member Author

@awaelchli yes, you are probably right. I think both the learning rate finder and batch scaler was pretty new features at that point, so if this PR landed much earlier it would only have affected early adopters. However, it seems a lot of people are using them now (even with the restrictions they currently have) so we probably need to go with a slow deprecation.

Okay, I am closing this for now, and will start over with smaller PRs. @awaelchli do you want to be more involved in this process (can I at least ping you when I have a new PR?)

@awaelchli
Copy link
Contributor

@SkafteNicki cool! yes please feel free to ping me on whatever. Very excited for this Tuner!

@Borda Borda modified the milestones: 0.9.x, 0.9.0 Aug 20, 2020
@Borda
Copy link
Member

Borda commented Aug 21, 2020

@SkafteNicki cool! yes please feel free to ping me on whatever. Very excited for this Tuner!

me too :]

@justusschock
Copy link
Member

and me as well :D

@SkafteNicki SkafteNicki mentioned this pull request Aug 25, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants