Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Quantile Forest Support #384

Merged
merged 12 commits into from
Aug 3, 2023
Merged

ENH: Quantile Forest Support #384

merged 12 commits into from
Aug 3, 2023

Conversation

reidjohnson
Copy link
Contributor

Fixes #383.

The quantile-forest package extends sklearn random forest regressors to implement quantile regression forests. Without these changes, the Cython-based model object is unable to be loaded by skops due to its use of the __reduce__ method.

I understand adding support for another package may be outside the current scope of this project. As the primary maintainer of the quantile-forest package, I'd also be interested in better understanding if there's a recommended alternative to the use of __reduce__ that would enable the model to be compatible with this project's existing secure persistence feature.

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for providing this fix. I haven't done a proper review yet. Would you mind adding a test for quantile forest estimators? It could be similar to the tests here. If you need help with that, let us know.

skops/_min_dependencies.py Outdated Show resolved Hide resolved
reidjohnson and others added 2 commits July 23, 2023 02:25
-Adds base unit tests
-Patches constructor visualization error
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@@ -232,6 +232,9 @@ def walk_tree(
"here: https://github.com/skops-dev/skops/issues"
)

if node_name == "constructor":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested resolution for #385.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include that in a separate PR.

@reidjohnson
Copy link
Contributor Author

@BenjaminBossan: Sure, I took a pass at adding a test for the quantile forest estimators. This also uncovered a bug in the visualization of sklearn DecisionTreeRegressors reported as #385, with a suggested resolution included in these changes.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice PR. Other than the small nits, I think this looks good.

Might be time for us to start working on a better way to handle C extension types.

Comment on lines 35 to 38
self.trusted = self._get_trusted(
trusted,
[get_module(QuantileForest) + ".QuantileForest"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happier if we don't trust them by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -232,6 +232,9 @@ def walk_tree(
"here: https://github.com/skops-dev/skops/issues"
)

if node_name == "constructor":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include that in a separate PR.

Comment on lines 412 to 415
tree_methods = [
quantile_forest.RandomForestQuantileRegressor,
quantile_forest.ExtraTreesQuantileRegressor,
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to @pytest.mark.parametrize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@reidjohnson
Copy link
Contributor Author

Thanks for the review. Updated based on suggestions. Removed fix for #385 in favor of PR #386.

@adrinjalali
Copy link
Member

@reidjohnson tests failing

@reidjohnson
Copy link
Contributor Author

Failing due to #385.

Comment on lines 1 to 49
from __future__ import annotations

from typing import Any, Sequence

from ._protocol import PROTOCOL
from ._sklearn import ReduceNode, reduce_get_state
from ._utils import LoadContext, SaveContext


def _lazy_import():
from quantile_forest._quantile_forest_fast import QuantileForest

def quantile_forest_get_state(
obj: Any,
save_context: SaveContext,
) -> dict[str, Any]:
state = reduce_get_state(obj, save_context)
state["__loader__"] = "QuantileForestNode"
return state

class QuantileForestNode(ReduceNode):
def __init__(
self,
state: dict[str, Any],
load_context: LoadContext,
trusted: bool | Sequence[str] = False,
) -> None:
super().__init__(
state,
load_context,
constructor=QuantileForest,
trusted=trusted,
)
self.trusted = self._get_trusted(trusted, [])

return QuantileForest, QuantileForestNode, quantile_forest_get_state


try:
QuantileForest, QuantileForestNode, quantile_forest_get_state = _lazy_import()

# tuples of type and function that gets the state of that type
GET_STATE_DISPATCH_FUNCTIONS = [(QuantileForest, quantile_forest_get_state)]

# tuples of type and function that creates the instance of that type
NODE_TYPE_MAPPING = {("QuantileForestNode", PROTOCOL): QuantileForestNode}
except ModuleNotFoundError:
GET_STATE_DISPATCH_FUNCTIONS = []
NODE_TYPE_MAPPING = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this instead? (slightly different)

from __future__ import annotations

from typing import Any, Sequence

from ._protocol import PROTOCOL
from ._sklearn import ReduceNode, reduce_get_state
from ._utils import LoadContext, SaveContext

try:
    from quantile_forest._quantile_forest_fast import QuantileForest
except ImportError:
    QuantileForest = None


def quantile_forest_get_state(
    obj: Any,
    save_context: SaveContext,
) -> dict[str, Any]:
    state = reduce_get_state(obj, save_context)
    state["__loader__"] = "QuantileForestNode"
    return state


class QuantileForestNode(ReduceNode):
    def __init__(
        self,
        state: dict[str, Any],
        load_context: LoadContext,
        trusted: bool | Sequence[str] = False,
    ) -> None:
        if QuantileForest is None:
            raise ImportError(
                "`quantile_forest` is missing and needs to be installed in order to"
                " load this model."
            )

        super().__init__(
            state,
            load_context,
            constructor=QuantileForest,
            trusted=trusted,
        )
        self.trusted = self._get_trusted(trusted, [])


# tuples of type and function that gets the state of that type
if QuantileForest is not None:
    GET_STATE_DISPATCH_FUNCTIONS = [(QuantileForest, quantile_forest_get_state)]

# tuples of type and function that creates the instance of that type
NODE_TYPE_MAPPING = {("QuantileForestNode", PROTOCOL): QuantileForestNode}

@reidjohnson reidjohnson marked this pull request as ready for review August 1, 2023 16:36


class TestQuantileForest:
"""Tests for RandomForestQuantileRegressor and ExtraTreesQuantileRegressor"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not missing QuantileForest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the user-facing classes which are explicitly tested, QuantileForest is the underlying Cython object shared by the user-facing classes and isn't explicitly tested here (but is serialized by the loading/dumping).

Copy link
Collaborator

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for adding the support. I have an extremely minor comment, but this is good to be merged from my point of view.

@adrinjalali
Copy link
Member

I just noticed this is missing a changelog entry.

@reidjohnson
Copy link
Contributor Author

Updated, thanks for flagging.

@adrinjalali adrinjalali merged commit 61aa534 into skops-dev:main Aug 3, 2023
13 of 15 checks passed
@brentonmallen1
Copy link

awesome work everyone, y'all are amazing! I really appreciate it!

@reidjohnson reidjohnson deleted the quantile-forest-support branch August 4, 2023 01:13
@brentonmallen1
Copy link

I finally got back to this effort and it seems like, even with the changes, I'm still seeing the following error when attempting to load a quantile forest model:

TypeError: __cinit__() takes at least 2 positional arguments (0 given)```

Did this come up during testing? Am I missing something?

@brentonmallen1
Copy link

more specifically: File "quantile_forest/_quantile_forest_fast.pyx", line 568, in quantile_forest._quantile_forest_fast.QuantileForest.__cinit__

@reidjohnson
Copy link
Contributor Author

@brentonmallen1 Just checking, but are you using the development build? I don't believe there's yet been a public release since these changes were merged.

@brentonmallen1
Copy link

I was using the main release for 0.8.0 because the changelog had mentioned QuantileForest support, but I could have misread. I can try using the dev release and give it another go and report back. thank you

@brentonmallen1
Copy link

well, I don't see the quantile changes in my site-packages so it does seem like I haven't installed what I was expecting. I also don't see a dev branch/build, how would I go about installing that directly?

@reidjohnson
Copy link
Contributor Author

@brentonmallen1 I'll defer to the maintainers, but a viable route is to follow the dev setup described in the contributing doc.

@adrinjalali
Copy link
Member

We just released 0.9.0, which includes this change.

@brentonmallen1
Copy link

y'all are wonderful! thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't Load Quantile Forest
4 participants