Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Hyperband implementation #221

Merged
merged 171 commits into from
Jun 14, 2019
Merged

ENH: Hyperband implementation #221

merged 171 commits into from
Jun 14, 2019

Conversation

stsievert
Copy link
Member

@stsievert stsievert commented Jun 21, 2018

What does this implement?
This implements a state-of-the-art model selection algorithm, Hyperband. I give a concise summary in dask/dask-searchcv#72 (comment). Briefly, this algorithm only requires computational budget as input. Hyperband will find close to the best possible parameters with the given computational budget.* It performs the search via an adaptive scheme, and "kills" off poor performing models. The intuition for this is simple: it makes no sense to waste computation on poor performing models if the goal is to find the best performing model with as little computation as possible. More detail can be found in their paper: https://arxiv.org/abs/1603.06560

This will only work with models that implement partial_fit. I do not see this as blocking because optimization is iterative, and believe education and outreach will help. I aim to have this work for all classes that implement score, partial_fit and {get, set}_params.

* "will find" with high probability, "close" is within a log factor of the lower bound, "best" is highest expected cross-validation score.

Reference Issues/PRs
This addresses #161, and is an item on our beta roadmap (which has an issue: #210).

This is a refactor of dask/dask-searchcv#72 and depends dask.distributed. The dependence on dask.distributed allows use of as_completed, which greatly simplifies the code and easily allows for an extension that makes this algorithm better suited for parallel architectures.

TODO

Future work

  • implement async variant
  • implement Hyperband for black box functions

Related work

Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major comments from me yet from a dask/computing perspective. I'm curious to see how dealing with blocks eventually gets folded in.

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Going through the dask / distributed stuff now.

Overall, it's neat how clean this ended up being.

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
tests/model_selection/test_hyperband.py Outdated Show resolved Hide resolved
tests/model_selection/test_hyperband.py Outdated Show resolved Hide resolved
tests/model_selection/test_hyperband.py Outdated Show resolved Hide resolved
tests/model_selection/test_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
@stsievert stsievert force-pushed the hyperband branch 2 times, most recently from 86854b4 to 10b9404 Compare June 22, 2018 21:42
@stsievert
Copy link
Member Author

I've got it working for both {sklearn, dask-ml} models and {dask, numpy} arrays, and it's tested for all possible combos. However, the tests are non-deterministic and fail about 5% of the time. There's two good tracebacks on the py27 CircleCI tests if you're interested.

I'd also have a bunch of complicated logic to avoid communicating trained models. I'd like to avoid it by defining a _partial_fit_and_train function that returns a dask.delayed(model) object to avoid unnecessary communication with model_future.result(). This will work, correct?

@mrocklin
Copy link
Member

I'd also have a bunch of complicated logic to avoid communicating trained models.

Is this already implemented? I didn't see anything crazy complicated.

I'd like to avoid it by defining a _partial_fit_and_train function that returns a dask.delayed(model) object to avoid unnecessary communication with model_future.result(). This will work, correct?

I recommend that for simplicity's sake we not include any dask code within any function that we call from a task. So I recommend not doing this if possible.

Can you explain more about where you're calling model_future.result()? Ideally you would never have to call this except when you want to pull down the final result at the end. Ideally you're going to just keep submitting _partial_fit functions to run on model_meta futures and Dask will think about how best to avoid communication costs.

@mrocklin
Copy link
Member

I'm taking a look at this now, it seems like we're still calling dask from within Dask. My path so far has been the following:

  • Run tests,verified that they fail
  • Convert the test to something that I can easily run and see the dashboard, notice that there are way more threads being generated than just the four of my laptop (there are many horizontal bars in the task stream of my plot), a common sign that tasks are starting up other tasks
  • I now want to run pdb in various function to see what is going on. To do this I need async-style tests.
  • Convert the _hyperband function to run asynchronously, run tests again
  • Two cases of calling compute within tasks are
    1. Calling .compute() on the delayed objects that are passed in. These should be numpy arrays. I'll push a fix for this shortly
    2. Calling partial_fit on a dask_ml.Incremental object, presumably we should just be calling the normal partial_fit method of the normal sklearn model contained within
  • Hopefully this will clear things up. I'll also try to track down what was failing within the workers in the more complex case for others.

@mrocklin
Copy link
Member

Anyway, I'd like to take the lock on this PR for a moment. I'll try to push something up soonish.

@mrocklin
Copy link
Member

What is the right way to handle testing data? It looks like it can be a dask array. Should we pass the entire thing to every score call or should we be iterating over it in chunks like with the training data?

@mrocklin
Copy link
Member

OK, I've pushed a fix. Things are flattened down now so that no task generates other tasks. I made a few possibly-wrong decisions in this work, so applying a careful eye to those changes would be welcome. I tried to add a comment in places where I wasn't sure.

@stsievert
Copy link
Member Author

stsievert commented Jun 23, 2018

not include any dask code within any function

Ah – I'm starting to understand. We don't want to launch tasks from tasks, and calling dask code would do that (and of course Incremental.partial_fit uses dask code, though I didn't consider it). I think #224 is to blame for the testing non-determinism, and think it's resolved with #225.

Correct me if I'm wrong: we want to avoid launch tasks within tasks because it's complicated. At the least, it's hard to debug and produces messy tracebacks. What other downsides are there?

What is the right way to handle testing data? ... should we be iterating over it in chunks like with the training data?

Can I have some clarification on "iterating over it in chunks"? I can take this two ways.This algorithm is adaptive, and needs some intermediary feedback. We can base every adaptive decision from

  1. the score produced by the entire test dataset (perform a sum/average after looping over the entire dataset)
  2. the score produced by one chunk in the dask array

Looking at your implementation, it looks like both are easy to implement so I don't think it matters. But I am okay with 1 but cautious about 2, though I think 2 is okay for the iid case (update: Hyperband assumes 1. In the iid case I think Hyperband would work with some probability, but there's no guarantee).

I now want to run pdb in various function to see what is going on. To do this I need async-style tests.

Can you expand upon this?

Can you explain more about where you're calling model_future.result()?

I only call score_future.result() – it only returns a dictionary that includes the model's score and ID (and a few other integers/strings). I only pass around and update a model future, and have a bit of logic for that (although apparently it's not complicated).

Ideally you're going to just keep submitting _partial_fit functions to run on model_meta futures and Dask will think about how best to avoid communication costs.

This is what I'm doing. I was unsure if I could define a _partial_fit_and_score(model) -> (score, model_future) function, which would make the code one level simpler. I don't think we can if we want to leave dask code out of _partial_fit_and_score,

so applying a careful eye to those changes would be welcome. I tried to add a comment in places where I wasn't sure.

Thanks, and I'll give it a careful look over on Monday.

@mrocklin
Copy link
Member

Correct me if I'm wrong: we want to avoid launch tasks within tasks because it's complicated. At the least, it's hard to debug and produces messy tracebacks. What other downsides are there?

Yup, that's mostly it. I anticipate that it will be much more expensive to write and maintain. I also wouldn't be surprised if it had negative effects when other people use it in novel and surprising ways. Users will always add their own crazy schemes to whatever we do, so the fewer layers of complexity we add the better for everyone, especially when they come asking questions.

Can I have some clarification on "iterating over it in chunks"?

Given the rest of your explanation here it seems like you have a fair understanding of my question. It seems like you frequently score each model with some testing data. I didn't know if that should be the same testing data every time or if we had some different testing data for each epoch.

I now want to run pdb in various function to see what is going on. To do this I need async-style tests.

Can you expand upon this?

When writing code I find it very useful to be able to pause execution, and look at the state of any function that's running. Typically I do this by adding the following line wherever in the code I'm interested in, and then run a simple test.

import pdb; pdb.set_trace()`

When the test sets up separate processes then this doesn't work (you can't use pdb across processes) but if all of the client, scheduler, and workers are running in the same thread then it does work. This is one of the major advantages of using async-style programming in tests. If you want to try this, add import pdb; pdb.set_trace() into the body of the _partial_fit function and then run py.test tests/model_selection/test_hyperband.py::test_sklearn_async

This is what I'm doing. I was unsure if I could define a _partial_fit_and_score(model) -> (score, model_future) function, which would make the code one level simpler. I don't think we can if we want to leave dask code out of _partial_fit_and_score,

I see both approaches as appropriate. I don't see why dask code would need to be present in either case. In both cases we have scikit-learn models and we have numpy arrays and we want to call various scikit-learn style functions on them, like partial_fit, or like score. I don't think that Dask needs to be involved for any of this work.

Thanks, and I'll give it a careful look over on Monday.

Yeah, no rush.

@stsievert
Copy link
Member Author

stsievert commented Jun 24, 2018

To do this I need async-style tests.

Can you expand upon this?

Sorry, I should have been more clear. From the docs, I now see the tests need to be async so they don't block – but how do you work around defining async functions but still support Python 2.7?

I’m a huge fan of debuggers and a frequent user of ipdb. And thank you for the tips on running with pytest – it makes complete sense the debugger is launched, but  I (naively) assumed pdb didn't work with pytest.

@mrocklin
Copy link
Member

but how do you work around defining async functions but still support Python 2.7?

Normally we use Tornado @gen.coroutines and the yield statement. Search for @gen_cluster tests in https://github.com/dask/distributed/blob/master/distributed/tests/test_client.py and take a look at http://distributed.readthedocs.io/en/latest/develop.html#writing-tests .

In this case I used Python 3 style async def functions because I wanted the async for loop for as_completed. We can get around this to support Python 2 with something like dask/distributed#2071

@mrocklin
Copy link
Member

Ah, I see you've already seen the writing tests doc. My apologies for the duplication.

You may also want to take a look at http://distributed.readthedocs.io/en/latest/asynchronous.html (though I would ignore the asyncio sections and just focus on tornado).

dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
@TomAugspurger TomAugspurger mentioned this pull request Jun 26, 2018
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
@mrocklin
Copy link
Member

I'd like to give this a shot on something slightly less trivial than the example in the test. @stsievert can you recommend something? Do you have a notebook or other example that you're testing on locally that isn't present here?

@stsievert stsievert force-pushed the hyperband branch 2 times, most recently from f05c223 to 4565213 Compare June 26, 2018 21:35
@stsievert
Copy link
Member Author

stsievert commented Jun 7, 2019

Takeaways from my review #221 (review) (GitHub's UI doesn't make this easy to flag). I'll update this comment with any updates.


Potential concerns:

  • (Wording concern) In metadata/metadata_, there are keys n_models and partial_fit_calls. I think there's a mild clash here over the n_ prefix – why not models/partial_fit_calls or n_models/n_partial_fit_calls?
    • The key partial_fit_calls is in history_/cv_results_/etc
    • "call" is a verb. "model" is a noun (in this context). I'm okay with the n_ prefix for nouns but not verbs.
    • (came up in r287881511)

API changes:

  • removed some decisions from metadata because they weren't actually decision points.
  • n_initial_iter is optional now and default to None (and this behavior is tested now). n_initial_iter is chosen correctly to obtain one model at the end if it's None. It can be overridden (which is useful for Hyperband).
  • Added metadata_ and metadata to SuccessiveHalvingSearchCV. This allows the user to see what max_iter will be and the total number of partial_fit calls.
    • This replaces the "removing the requirement for n_initial_iter". I have a note in the n_initial_iter documentation saying to look at metadata.
    • Docs for metadata and metadata_
  • Made max_iter optional for SuccessiveHalvingSearchCV. By default, it allows SuccessiveHalvingSearchCV to run until completion (/when only one model is left).

Test changes:

* Reorganize tests to test IncrementalSearchCV for history_
* Reword HyperbandSearchCV warning on patience if too low
* Add comments to tests
and require specification of n_initial_iter
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Show resolved Hide resolved
dask_ml/model_selection/_hyperband.py Outdated Show resolved Hide resolved
@stsievert
Copy link
Member Author

stsievert commented Jun 8, 2019

I think this is ready for merge now. The CI tests are unrelated (most are tracked in #517, and the Travis tests are one some unrelated documentation). I think I've resolved most of my concerns.

I'd like a review of my changes. A summary of my changes is in #221 (comment), which includes some API changes. A diff of the recent changes is in https://github.com/stsievert/dask-ml/compare/d17bdb3..070784d

Also, when this is merged I'd appreciate a squash and merge. My commit history is far from clean.

@TomAugspurger
Copy link
Member

Sorry I missed your last comment @stsievert. I gave a quick glance through the recent diff, and things look nice. I'll give it another look tonight and merge if things all look fine.

And, I think I'll do a release shortly after merging, so that we can use this in e.g. dask-examples. Sound good? If we find issues I can easily do another release.

@mrocklin
Copy link
Member

mrocklin commented Jun 13, 2019 via email

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave another run through, and things look quite nice @stsievert. Nice work!

@TomAugspurger TomAugspurger merged commit 7bce4b6 into dask:master Jun 14, 2019
@stsievert
Copy link
Member Author

stsievert commented Jun 14, 2019

I think I'll do a release shortly after merging, so that we can use this in e.g. dask-examples. Sound good? If we find issues I can easily do another release.

All of that sounds good. And I think inclusion in Dask-examples is a good idea. It also got suggested by a reviewer scipy-conference/scipy_proceedings#464 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants