Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather CV Results as Completed #433

Merged
merged 88 commits into from
May 20, 2019
Merged

Gather CV Results as Completed #433

merged 88 commits into from
May 20, 2019

Conversation

vecchp
Copy link
Contributor

@vecchp vecchp commented Dec 1, 2018

In progress PR to address #415.

@TomAugspurger
Copy link
Member

The CI failures can probably be ignored. I'll try to get them fixed next week.

@vecchp can you think of a way to test this? The best I've come up with is a custom estimator class, that when given a specific hyper parameter combination does something like

get_worker inside the task and closes the worker, to simulate it going away. https://distributed.dask.org/en/latest/api.html#distributed.get_worker. Haven't tested that out yet.

@jcrist will you have time next week to give feedback on this approach?

@vecchp
Copy link
Contributor Author

vecchp commented Dec 2, 2018

@TomAugspurger I'll think a little about the tests, though your current suggestion seems reasonable.

dask_ml/model_selection/_search.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_search.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_search.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_search.py Outdated Show resolved Hide resolved
dask_ml/model_selection/_search.py Outdated Show resolved Hide resolved
@vecchp
Copy link
Contributor Author

vecchp commented Dec 5, 2018

@jcrist @TomAugspurger made some changes w.r.t the PR comments. On my end all tests pass except for test_cache_cv. I believe the failure is due to the cache being hit more than the previous expected values given the current changes.

build_graph is now broken into build_cv_graph & build_result_graph, I had to return main_token, X_name, y_name, weights, fit_params and then pass it to build_result_graph which seems a bit messy, but I haven't thought of a better way. What are your thoughts on these changes now?

I also haven't got around to confirming if the futures no longer need to be canceled or writing robust tests for the new as_completed feature.

@vecchp
Copy link
Contributor Author

vecchp commented Apr 18, 2019

@TomAugspurger checking in again to see what else should be done. All tests pass.

@TomAugspurger
Copy link
Member

TomAugspurger commented Apr 18, 2019 via email

@TomAugspurger
Copy link
Member

All tests pass.

It seems like coverage is decreasing. Does that make sense, or is this a false positive? https://codecov.io/gh/dask/dask-ml/compare/cfa57e5789fc29bdcee5d846b5116c07f5fbe292...9da5b8f0cfa5960a7ee70f216a684bff96a450b2/changes

@vecchp
Copy link
Contributor Author

vecchp commented May 4, 2019

All tests pass.

It seems like coverage is decreasing. Does that make sense, or is this a false positive? https://codecov.io/gh/dask/dask-ml/compare/cfa57e5789fc29bdcee5d846b5116c07f5fbe292...9da5b8f0cfa5960a7ee70f216a684bff96a450b2/changes

Makes sense, looks like it's dead code given the as_completed implementation no longer uses it. I'll update.

@vecchp
Copy link
Contributor Author

vecchp commented May 4, 2019

@TomAugspurger looks like all stages pass. I think codecov was removed but that should be back to normal as well given the dead code is now gone.

try:
from cytoolz import get, pluck
except ImportError: # pragma: no cover
from toolz import get, pluck

try:
from dask.distributed import as_completed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this try/except needed? distributed is a dependency now (#466).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing the optional check. Previous guidance was to make it option as it wasn't a dependency at the time.

@vecchp
Copy link
Contributor Author

vecchp commented May 7, 2019

@stsievert all optional distributed checks in this PR should be removed now.

@TomAugspurger any guidance left to get this merged?

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vecchp, and apologies once again for the delay on reviewing.

This looks quite good. My only request is a small compatibility layer for build_graph.

return fit_params


def build_cv_graph(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a compatibility shim with the old name? I believe that TPOT is using build_graph https://github.com/EpistasisLab/tpot/blob/b626271e6b5896a73fb9d7d29bebc7aa9100772e/tpot/gp_deap.py#L429

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just add back build_graph with a comment that it's going to be removed (not even a deprecation warning). This is a private module, but I was lazy when adding dask support to TPOT. Once this is in, I can cut a release and make a PR to tpot updating their call.

@TomAugspurger
Copy link
Member

I'm looking into the compatibility for tpot now. Will push an update.

@vecchp
Copy link
Contributor Author

vecchp commented May 20, 2019

Sounds good!

@TomAugspurger TomAugspurger merged commit 954d41f into dask:master May 20, 2019
@TomAugspurger
Copy link
Member

Thanks @vecchp!

@vecchp
Copy link
Contributor Author

vecchp commented May 21, 2019

@TomAugspurger thanks so much for getting this in!

@vecchp vecchp deleted the gather_results_as_completed branch May 21, 2019 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants