ENH: Hyperband implementation #221

stsievert · 2018-06-21T23:54:50Z

What does this implement?
This implements a state-of-the-art model selection algorithm, Hyperband. I give a concise summary in dask/dask-searchcv#72 (comment). Briefly, this algorithm only requires computational budget as input. Hyperband will find close to the best possible parameters with the given computational budget.* It performs the search via an adaptive scheme, and "kills" off poor performing models. The intuition for this is simple: it makes no sense to waste computation on poor performing models if the goal is to find the best performing model with as little computation as possible. More detail can be found in their paper: https://arxiv.org/abs/1603.06560

This will only work with models that implement partial_fit. I do not see this as blocking because optimization is iterative, and believe education and outreach will help. I aim to have this work for all classes that implement score, partial_fit and {get, set}_params.

^{* "will find" with high probability, "close" is within a log factor of the lower bound, "best" is highest expected cross-validation score.}

Reference Issues/PRs
This addresses #161, and is an item on our beta roadmap (which has an issue: #210).

This is a refactor of dask/dask-searchcv#72 and depends dask.distributed. The dependence on dask.distributed allows use of as_completed, which greatly simplifies the code and easily allows for an extension that makes this algorithm better suited for parallel architectures.

TODO

implement and test Hyperband implementation is correct
~~implement and test support for scikit-learn models~~ done in Incremental model selection #288
~~implement logging~~ Base class for estimators using _incremental.fit #356
implement and test ~~DaskBaseSearchCV~~ AdaptiveSearchCV inheritance
implement ~~two~~ one examples
~~feed random chunks to every partial_fit call~~ done in Incremental model selection #288
integrate with Base class for estimators using _incremental.fit #356
supply arguments to create individual SuccessiveHalving brackets
implement reasonable patience parameter
documentation

Future work

implement async variant
implement Hyperband for black box functions

Related work

[WIP] API: allow cross validation to work with partial_fit scikit-learn/scikit-learn#11266: partial_fit support in sklearn's cross_validate. This will yield more complete cross-validation (e.g., number of splits).
Implement partial_fit support in DaskBaseSearchCV dask-searchcv#75: partial_fit support in DaskBaseSearchCV. All Hyperband is really doing is some fancy early stopping with RandomizedSearchCV. Providing this support will allow for caching results that are part of a pipeline.
[MRG] ENH: Add partial fit support to Pipeline scikit-learn/scikit-learn#11321: partial_fit support for sklearn pipepines. My example will rely on a simpler version of this PR.
[MRG] implements log-uniform random variable scikit-learn/scikit-learn#11232: support for loguniform random variables in sklearn. This is a continuous version of np.logspace(-4, -1), and will make specifying continuous parameters in log-space eaiser (e.g., step size or regularization parameters).

mrocklin

No major comments from me yet from a dask/computing perspective. I'm curious to see how dealing with blocks eventually gets folded in.

dask_ml/model_selection/_hyperband.py

TomAugspurger

A few minor comments. Going through the dask / distributed stuff now.

Overall, it's neat how clean this ended up being.

dask_ml/model_selection/_hyperband.py

tests/model_selection/test_hyperband.py

dask_ml/model_selection/_hyperband.py

stsievert · 2018-06-23T03:28:57Z

I've got it working for both {sklearn, dask-ml} models and {dask, numpy} arrays, and it's tested for all possible combos. However, the tests are non-deterministic and fail about 5% of the time. There's two good tracebacks on the py27 CircleCI tests if you're interested.

I'd also have a bunch of complicated logic to avoid communicating trained models. I'd like to avoid it by defining a _partial_fit_and_train function that returns a dask.delayed(model) object to avoid unnecessary communication with model_future.result(). This will work, correct?

mrocklin · 2018-06-23T13:05:11Z

I'd also have a bunch of complicated logic to avoid communicating trained models.

Is this already implemented? I didn't see anything crazy complicated.

I'd like to avoid it by defining a _partial_fit_and_train function that returns a dask.delayed(model) object to avoid unnecessary communication with model_future.result(). This will work, correct?

I recommend that for simplicity's sake we not include any dask code within any function that we call from a task. So I recommend not doing this if possible.

Can you explain more about where you're calling model_future.result()? Ideally you would never have to call this except when you want to pull down the final result at the end. Ideally you're going to just keep submitting _partial_fit functions to run on model_meta futures and Dask will think about how best to avoid communication costs.

mrocklin · 2018-06-23T13:37:03Z

I'm taking a look at this now, it seems like we're still calling dask from within Dask. My path so far has been the following:

Run tests,verified that they fail
Convert the test to something that I can easily run and see the dashboard, notice that there are way more threads being generated than just the four of my laptop (there are many horizontal bars in the task stream of my plot), a common sign that tasks are starting up other tasks
I now want to run pdb in various function to see what is going on. To do this I need async-style tests.
Convert the _hyperband function to run asynchronously, run tests again
Two cases of calling compute within tasks are
1. Calling .compute() on the delayed objects that are passed in. These should be numpy arrays. I'll push a fix for this shortly
2. Calling partial_fit on a dask_ml.Incremental object, presumably we should just be calling the normal partial_fit method of the normal sklearn model contained within
Hopefully this will clear things up. I'll also try to track down what was failing within the workers in the more complex case for others.

mrocklin · 2018-06-23T13:37:19Z

Anyway, I'd like to take the lock on this PR for a moment. I'll try to push something up soonish.

mrocklin · 2018-06-23T13:47:47Z

What is the right way to handle testing data? It looks like it can be a dask array. Should we pass the entire thing to every score call or should we be iterating over it in chunks like with the training data?

mrocklin · 2018-06-23T14:16:31Z

OK, I've pushed a fix. Things are flattened down now so that no task generates other tasks. I made a few possibly-wrong decisions in this work, so applying a careful eye to those changes would be welcome. I tried to add a comment in places where I wasn't sure.

stsievert · 2018-06-23T20:41:15Z

not include any dask code within any function

Ah – I'm starting to understand. We don't want to launch tasks from tasks, and calling dask code would do that (and of course Incremental.partial_fit uses dask code, though I didn't consider it). I think #224 is to blame for the testing non-determinism, and think it's resolved with #225.

Correct me if I'm wrong: we want to avoid launch tasks within tasks because it's complicated. At the least, it's hard to debug and produces messy tracebacks. What other downsides are there?

What is the right way to handle testing data? ... should we be iterating over it in chunks like with the training data?

Can I have some clarification on "iterating over it in chunks"? I can take this two ways.This algorithm is adaptive, and needs some intermediary feedback. We can base every adaptive decision from

the score produced by the entire test dataset (perform a sum/average after looping over the entire dataset)
the score produced by one chunk in the dask array

Looking at your implementation, it looks like both are easy to implement so I don't think it matters. But I am okay with 1 but cautious about 2, ~~though I think 2 is okay for the iid case~~ (update: Hyperband assumes 1. In the iid case I think Hyperband would work with some probability, but there's no guarantee).

I now want to run pdb in various function to see what is going on. To do this I need async-style tests.

Can you expand upon this?

Can you explain more about where you're calling model_future.result()?

I only call score_future.result() – it only returns a dictionary that includes the model's score and ID (and a few other integers/strings). I only pass around and update a model future, and have a bit of logic for that (although apparently it's not complicated).

Ideally you're going to just keep submitting _partial_fit functions to run on model_meta futures and Dask will think about how best to avoid communication costs.

This is what I'm doing. I was unsure if I could define a _partial_fit_and_score(model) -> (score, model_future) function, which would make the code one level simpler. I don't think we can if we want to leave dask code out of _partial_fit_and_score,

so applying a careful eye to those changes would be welcome. I tried to add a comment in places where I wasn't sure.

Thanks, and I'll give it a careful look over on Monday.

mrocklin · 2018-06-24T00:23:04Z

Correct me if I'm wrong: we want to avoid launch tasks within tasks because it's complicated. At the least, it's hard to debug and produces messy tracebacks. What other downsides are there?

Yup, that's mostly it. I anticipate that it will be much more expensive to write and maintain. I also wouldn't be surprised if it had negative effects when other people use it in novel and surprising ways. Users will always add their own crazy schemes to whatever we do, so the fewer layers of complexity we add the better for everyone, especially when they come asking questions.

Can I have some clarification on "iterating over it in chunks"?

Given the rest of your explanation here it seems like you have a fair understanding of my question. It seems like you frequently score each model with some testing data. I didn't know if that should be the same testing data every time or if we had some different testing data for each epoch.

I now want to run pdb in various function to see what is going on. To do this I need async-style tests.

Can you expand upon this?

When writing code I find it very useful to be able to pause execution, and look at the state of any function that's running. Typically I do this by adding the following line wherever in the code I'm interested in, and then run a simple test.

import pdb; pdb.set_trace()`

When the test sets up separate processes then this doesn't work (you can't use pdb across processes) but if all of the client, scheduler, and workers are running in the same thread then it does work. This is one of the major advantages of using async-style programming in tests. If you want to try this, add import pdb; pdb.set_trace() into the body of the _partial_fit function and then run py.test tests/model_selection/test_hyperband.py::test_sklearn_async

This is what I'm doing. I was unsure if I could define a _partial_fit_and_score(model) -> (score, model_future) function, which would make the code one level simpler. I don't think we can if we want to leave dask code out of _partial_fit_and_score,

I see both approaches as appropriate. I don't see why dask code would need to be present in either case. In both cases we have scikit-learn models and we have numpy arrays and we want to call various scikit-learn style functions on them, like partial_fit, or like score. I don't think that Dask needs to be involved for any of this work.

Thanks, and I'll give it a careful look over on Monday.

Yeah, no rush.

stsievert · 2018-06-24T04:28:19Z

To do this I need async-style tests.

Can you expand upon this?

Sorry, I should have been more clear. From the docs, I now see the tests need to be async so they don't block – but how do you work around defining async functions but still support Python 2.7?

I’m a huge fan of debuggers and a frequent user of ipdb. And thank you for the tips on running with pytest – it makes complete sense the debugger is launched, but I (naively) assumed pdb didn't work with pytest.

mrocklin · 2018-06-24T13:27:39Z

but how do you work around defining async functions but still support Python 2.7?

Normally we use Tornado @gen.coroutines and the yield statement. Search for @gen_cluster tests in https://github.com/dask/distributed/blob/master/distributed/tests/test_client.py and take a look at http://distributed.readthedocs.io/en/latest/develop.html#writing-tests .

In this case I used Python 3 style async def functions because I wanted the async for loop for as_completed. We can get around this to support Python 2 with something like dask/distributed#2071

mrocklin · 2018-06-24T13:31:43Z

Ah, I see you've already seen the writing tests doc. My apologies for the duplication.

You may also want to take a look at http://distributed.readthedocs.io/en/latest/asynchronous.html (though I would ignore the asyncio sections and just focus on tornado).

dask_ml/model_selection/_hyperband.py

mrocklin · 2018-06-26T20:14:10Z

I'd like to give this a shot on something slightly less trivial than the example in the test. @stsievert can you recommend something? Do you have a notebook or other example that you're testing on locally that isn't present here?

stsievert · 2019-06-07T04:23:02Z

Takeaways from my review #221 (review) (GitHub's UI doesn't make this easy to flag). I'll update this comment with any updates.

Potential concerns:

(Wording concern) In metadata/metadata_, there are keys n_models and partial_fit_calls. I think there's a mild clash here over the n_ prefix – why not models/partial_fit_calls or n_models/n_partial_fit_calls?
- The key partial_fit_calls is in history_/cv_results_/etc
- "call" is a verb. "model" is a noun (in this context). I'm okay with the n_ prefix for nouns but not verbs.
- (came up in r287881511)

API changes:

removed some decisions from metadata because they weren't actually decision points.
n_initial_iter is optional now and default to None (and this behavior is tested now). n_initial_iter is chosen correctly to obtain one model at the end if it's None. It can be overridden (which is useful for Hyperband).
Added metadata_ and metadata to SuccessiveHalvingSearchCV. This allows the user to see what max_iter will be and the total number of partial_fit calls.
- This replaces the "removing the requirement for n_initial_iter". I have a note in the n_initial_iter documentation saying to look at metadata.
- Docs for metadata and metadata_
Made max_iter optional for SuccessiveHalvingSearchCV. By default, it allows SuccessiveHalvingSearchCV to run until completion (/when only one model is left).
- Documentation for max_iter

Test changes:

Added search.estimator is SHA.estimator instead of search.estimator == SHA.estimator in ENH: Hyperband implementation #221 (comment). Is the is inclusion okay?
I moved the test that tests history conditions/columns to be an IncrementalSearchCV test in test_incremental.py. (see test_incremental.py#test_history).

* Reorganize tests to test IncrementalSearchCV for history_ * Reword HyperbandSearchCV warning on patience if too low * Add comments to tests

and require specification of n_initial_iter

dask_ml/model_selection/_hyperband.py

stsievert · 2019-06-08T16:25:41Z

I think this is ready for merge now. The CI tests are unrelated (most are tracked in #517, and the Travis tests are one some unrelated documentation). I think I've resolved most of my concerns.

I'd like a review of my changes. A summary of my changes is in #221 (comment), which includes some API changes. A diff of the recent changes is in https://github.com/stsievert/dask-ml/compare/d17bdb3..070784d

Also, when this is merged I'd appreciate a squash and merge. My commit history is far from clean.

TomAugspurger · 2019-06-13T21:35:36Z

Sorry I missed your last comment @stsievert. I gave a quick glance through the recent diff, and things look nice. I'll give it another look tonight and merge if things all look fine.

And, I think I'll do a release shortly after merging, so that we can use this in e.g. dask-examples. Sound good? If we find issues I can easily do another release.

mrocklin · 2019-06-13T21:54:37Z

+1 to putting this in dask-examples

…

On Thu, Jun 13, 2019 at 11:35 PM Tom Augspurger ***@***.***> wrote: Sorry I missed your last comment @stsievert <https://github.com/stsievert>. I gave a quick glance through the recent diff, and things look nice. I'll give it another look tonight and merge if things all look fine. And, I think I'll do a release shortly after merging, so that we can use this in e.g. dask-examples. Sound good? If we find issues I can easily do another release. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#221?email_source=notifications&email_token=AACKZTF6ABZBGB7JRX4YGP3P2K4SRA5CNFSM4FGLYW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXVDGUY#issuecomment-501887827>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTCMXH4BYIG4VAS6NQTP2K4SRANCNFSM4FGLYW4Q> .

TomAugspurger

Gave another run through, and things look quite nice @stsievert. Nice work!

stsievert · 2019-06-14T12:46:28Z

I think I'll do a release shortly after merging, so that we can use this in e.g. dask-examples. Sound good? If we find issues I can easily do another release.

All of that sounds good. And I think inclusion in Dask-examples is a good idea. It also got suggested by a reviewer scipy-conference/scipy_proceedings#464 (comment)

stsievert mentioned this pull request Jun 21, 2018

ENH: Hyperband implementation dask/dask-searchcv#72

Closed

3 tasks

stsievert force-pushed the hyperband branch 5 times, most recently from dc7ac95 to 4f2ff24 Compare June 22, 2018 04:48

mrocklin reviewed Jun 22, 2018

View reviewed changes

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

TomAugspurger reviewed Jun 22, 2018

View reviewed changes

stsievert force-pushed the hyperband branch 2 times, most recently from 86854b4 to 10b9404 Compare June 22, 2018 21:42

stsievert mentioned this pull request Jun 23, 2018

MAINT: call model.fit directly if ndarrays passed to partial_fit #225

Closed

mrocklin reviewed Jun 26, 2018

View reviewed changes

TomAugspurger mentioned this pull request Jun 26, 2018

Python 3 only? #229

Closed

TomAugspurger reviewed Jun 26, 2018

View reviewed changes

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

TomAugspurger reviewed Jun 26, 2018

View reviewed changes

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

mrocklin reviewed Jun 26, 2018

View reviewed changes

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved

stsievert force-pushed the hyperband branch 2 times, most recently from f05c223 to 4565213 Compare June 26, 2018 21:35

stsievert added 7 commits June 4, 2019 23:55

TST: makes sures decisions in metadata are actually decision points

d3a9258

DOC: reorganizes to move history above model_history

20d392f

DOC: makes sure keys accurately reflected in docstring

034fc15

DOC: clarifies validation/test

d907c25

DOC: rewords parameter setting section of Hyperband

4b7650c

ENH: start removal of n_initial_iter

483e22f

API: add n_initial_iter back in, but make it optional

cd38d7c

stsievert force-pushed the hyperband branch from d9a393d to e5c8c60 Compare June 7, 2019 04:33

MAINT: lint, remove extra file

adaa450

stsievert force-pushed the hyperband branch from e5c8c60 to adaa450 Compare June 7, 2019 04:35

stsievert mentioned this pull request Jun 7, 2019

sklearn dev tests failing (deprecated argument to OneHotEncoder and QuantileTransformer) #517

Closed

stsievert added 3 commits June 7, 2019 20:50

MAINT: respond to review

af2802a

* Reorganize tests to test IncrementalSearchCV for history_ * Reword HyperbandSearchCV warning on patience if too low * Add comments to tests

TST: add clarifyig comments to test

b529411

API: add metadata to SHA

72660fa

and require specification of n_initial_iter

stsievert commented Jun 8, 2019

View reviewed changes

stsievert added 4 commits June 7, 2019 22:43

DOC; Hyperband docstring for parameters

022d433

Fixes typo in tests

b850849

API: make max_iter optional for SHA

38de30c

MAINT: lint

070784d

stsievert added 2 commits June 12, 2019 22:59

Merge branch 'master' of https://github.com/dask/dask-ml

2eb811f

Merge branch 'master' into hyperband

13c5b68

TomAugspurger approved these changes Jun 14, 2019

View reviewed changes

TomAugspurger merged commit 7bce4b6 into dask:master Jun 14, 2019

stsievert mentioned this pull request Jun 19, 2019

Blog post on Dask-ML's hyperparameter optimization dask/dask-blog#33

Closed

stsievert mentioned this pull request Oct 9, 2019

SuccessiveHalving should have a default n_initial_iter #553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Hyperband implementation #221

ENH: Hyperband implementation #221

stsievert commented Jun 21, 2018 •

edited

Loading

mrocklin left a comment

TomAugspurger left a comment

stsievert commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

stsievert commented Jun 23, 2018 •

edited

Loading

mrocklin commented Jun 24, 2018

stsievert commented Jun 24, 2018 •

edited

Loading

mrocklin commented Jun 24, 2018

mrocklin commented Jun 24, 2018

mrocklin commented Jun 26, 2018

stsievert commented Jun 7, 2019 •

edited

Loading

stsievert commented Jun 8, 2019 •

edited

Loading

TomAugspurger commented Jun 13, 2019

mrocklin commented Jun 13, 2019 via email

TomAugspurger left a comment

stsievert commented Jun 14, 2019 •

edited

Loading

ENH: Hyperband implementation #221

ENH: Hyperband implementation #221

Conversation

stsievert commented Jun 21, 2018 • edited Loading

mrocklin left a comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

stsievert commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

mrocklin commented Jun 23, 2018

stsievert commented Jun 23, 2018 • edited Loading

mrocklin commented Jun 24, 2018

stsievert commented Jun 24, 2018 • edited Loading

mrocklin commented Jun 24, 2018

mrocklin commented Jun 24, 2018

mrocklin commented Jun 26, 2018

stsievert commented Jun 7, 2019 • edited Loading

stsievert commented Jun 8, 2019 • edited Loading

TomAugspurger commented Jun 13, 2019

mrocklin commented Jun 13, 2019 via email

TomAugspurger left a comment

Choose a reason for hiding this comment

stsievert commented Jun 14, 2019 • edited Loading

stsievert commented Jun 21, 2018 •

edited

Loading

stsievert commented Jun 23, 2018 •

edited

Loading

stsievert commented Jun 24, 2018 •

edited

Loading

stsievert commented Jun 7, 2019 •

edited

Loading

stsievert commented Jun 8, 2019 •

edited

Loading

stsievert commented Jun 14, 2019 •

edited

Loading