Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ens builder #1434

Merged
merged 119 commits into from
May 13, 2022
Merged

Update ens builder #1434

merged 119 commits into from
May 13, 2022

Conversation

eddiebergman
Copy link
Contributor

Cleans up ensemble builder to make it easier to change and test.

TODO

@codecov
Copy link

codecov bot commented Mar 28, 2022

Codecov Report

Merging #1434 (92f59c2) into development (35d4d22) will decrease coverage by 0.27%.
The diff coverage is 78.33%.

@@               Coverage Diff               @@
##           development    #1434      +/-   ##
===============================================
- Coverage        84.32%   84.05%   -0.28%     
===============================================
  Files              147      151       +4     
  Lines            11397    11448      +51     
  Branches          1986     1988       +2     
===============================================
+ Hits              9611     9623      +12     
- Misses            1261     1298      +37     
- Partials           525      527       +2     

Impacted file tree graph

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.

autosklearn/util/functional.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/run.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/run.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/run.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/manager.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback. I'll check the tests later.

autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
autosklearn/ensemble_building/manager.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/manager.py Outdated Show resolved Hide resolved
autosklearn/ensemble_building/builder.py Show resolved Hide resolved
Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final part of the review.

test/test_estimators/test_estimators.py Show resolved Hide resolved
logger_port=automl._logger_port,
random_state=DEFAULT_SEED,
)
return manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, where is this manager used later? I don't see right now where the test for this is located.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No where, I should have some test that use it. Sorry, slipped the todos

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: did you add such a check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do so today, there's not too much functionality to check as most of the heavy lifting is done inside EnsembleBuilder. The real runs didn't fail so I assume it's okay, meaning I could start the benchmarking and write the tests after, almost nothing has changed. Hopefully no bugs appear in the testing

autosklearn/ensemble_building/builder.py Show resolved Hide resolved
Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final part of the review.

@eddiebergman eddiebergman merged commit 0b5fa19 into development May 13, 2022
@mfeurer mfeurer deleted the update_ens_builder branch May 13, 2022 16:31
github-actions bot pushed a commit that referenced this pull request May 13, 2022
github-actions bot pushed a commit to automl-private/auto-sklearn that referenced this pull request May 16, 2022
eddiebergman added a commit that referenced this pull request Aug 18, 2022
* Move ensemble_bulder test data to named folder

* Update backend to take a temlate to copy from

* Update tests to use new cases system

* Update tests to be documented and cleaned up

* Switch to using cached automl backends

* Readd missing file which failed test for `case_3_models`

* Seperate out tests that rely on old toy data and those that don't

* Setup test framework for ensemble builder on real situations

* Formatting

* Remove `unit_test` arg

* Remove SAVE2DISC

* Split builder and manager into seperate files

* Tidy up init of EnsembleBuilder

* Moved to cached properties

* Change List to list

* Move to solely using cached properties

* Add disk util file with `sizeof`

* Update tests to use cached mechanism

* Switch `sizeof` for disk consumption

* Remove disk consumption

* Remove unneeded function

* Add type hints and documenation

* Simplyify _read_np_fn

* Update get_valid_test_preds to use Pathlib

* Add intersection to functional

* Make functional take *args

* Further simplifications

* Add a dataclass to represent run information for builder

* Rename to Run

* Change to Run objects

* Formatting

* Reduce side effects of `compute_loss_per_model`

To make testing easier and changes easier, the targets are now passed to
the method. This also reduces it's complexity by removing the checking
from the method as we can assume the parameters coming in are correct.

* Change Tuple to tuple

* Forcibly add data files for tests

* Fix: Can now load pickled numpy arrays w/ test

* Add test for checking ensemble builder output

* Fix bug with using list instead of set

* Making deubgging message a little clearer

* Fix typing and case name

* Rename test file to reflect what it tests

* Make pynisher context optional

* Fix loaded models test

* Updates to Run dataclass

* Add method to `Run` to allow recording of last modified

* Change Run mtimes to dictionary

* Change `compute_loss_per_model` to use new Run dataclass

* Factor out run loss into main loop

* Simplyify get_nbest and compute_losses

* Major rewrite of ensemble builder main loop

* Change to simpler hashing

* Start value split

* Add `value_split`

* Reworked Builder

* Add some docstring

* Formatting

* Fix type signature

* Fix typing for `loss`

* Removed Literal

* Mypy fixes for ensemble builder

* Mypy fixes

* Tests for `Runs`

* Move `make_run` to fixtures

* Fix run deletion

* Test candidates

* Made delete it's own function

* Further simplifications

* Fixup test with simplification

* Test: `max_models` for `requires_deletion`

* Test: `memory_limit` for `requires_deletion`

* Test: Loss of runs

* Test: Delete runs

* Test: `fit_ensemble` of ensemble builder

* Add test for run time parameter

* Remove parameter `return_predictions`

* Add note about pickled arrays should not be supported

* Make cached automl instances copy backend

* Add valid static method to run

* Remove old test data

* Add filter for bad run dirs

* Made `main` args optional

* Fix check for updated runs

* Make `main` raise errors

* Fix default value for ensemble builder `main`

* Test valid ensemble with real runs

* Rename parameter for manager

* Add defaults and reorder parameters for EnsembleBuilderManager

* Fixup parameters in `fit_and_return_ensemble`

* Typing fixes

* Make `fit_and_return_ensemble` a staticmethod

* Add: `make_ensemble_builder_manager`

* Add: Test files for manager

* Add atomic rmtree

* Add: atomic rmtree now accepts where mv should go

* Make builder use atomic rmtree

* Fix import bugs, remove valid preds in builder

* Remove `np.inf` as valid arg for `read_at_most`

* Possible reproducible num_run, no predictions error

* Make automl caching robust to `pytest-xdist`

* Test fixes

* Extend interval for test on run caching

* Use pickle for reseting cache

* Fix test for caching mechanism to not rely on `stat`

* Move run deletion to the end of the builder `main`

* Remove `getattr` version of tae.client

* Remove `normalize`

* Extend not for `Run`

* Fix `__init__` of `Run`

* Parameter and comment fixes from feedback

* Change to `min(...)` instead of `sorted(...)[0]`

* Make default time `np.inf`

* Add test for safe deletion in builder

* Update docstring of `loss` for a run

* Remove stray print

* Minor feedback fixes

* Fix `_metric` to `_metrics`

* Fix `make_ensemble_builder`

* One more fix for multiple metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants