FIX: Decision Tree Visualization #386

reidjohnson · 2023-07-24T17:36:16Z

Fixes issue #385.

The root cause of the bug appears to be that the base sklearn tree constructor object is not wrapped as a valid skops Node type, but is still included in the visualized object tree.

This PR fixes the issue by appropriately processing this node.

Return empty if node is a constructor.

Swap LogisticRegression or RandomForestRegressor

adrinjalali

Thanks for the PR.

adrinjalali · 2023-07-25T19:29:22Z

skops/io/tests/test_visualize.py

-            ("clf", LogisticRegression(random_state=0, solver="liblinear")),
+            ("clf", RandomForestRegressor(random_state=0)),


we could test both.

Yeah, good point. One challenge here is that I don't think it'll be straightforward without a minor refactor of the pipeline fixture. The simplest approaches seem to be to parameterize the fixture or to use a GridSearchCV. Any preferences or suggestions?

You could also simply have a separate test only for RandomForestRegressor and RandomForestClsasifier.

Added separate test for DecisionTreeClassifier and DecisionTreeRegressor.

adrinjalali · 2023-07-25T19:30:50Z

skops/io/_visualize.py

+    if node_name == "constructor":
+        return


this seems a bit shady, it might skip things that the user should see. Can you show here an example of what's inside this node in our case?

One would also need to update the node_name docstring if we're skipping here.

Very sympathetic to the concerns. Suspect you all know better here, so would defer to your recommendations.

As far as I am aware, sklearn tree-based models are the only ones that produce a "constructor" object of this type (presumably as an artifact of how skops handles C Extension types).

Taking the example:

from sklearn.tree import DecisionTreeClassifier from skops.io import dumps, loads, visualize dumped = dumps(DecisionTreeClassifier().fit([[0, 1], [2, 3], [4, 5]], [0, 1, 2])) loaded = loads(dumped) visualize(dumped)

The skipped node is <class 'sklearn.tree._tree.Tree'>. Skipping processing of this object, the code outputs:

root: sklearn.tree._classes.DecisionTreeClassifier
└──
attrs: builtins.dict

├──
criterion: json-type("gini")

├──
splitter: json-type("best")

├──
max_depth: json-type(null)

├──
min_samples_split: json-type(2)

├──
min_samples_leaf: json-type(1)

├──
min_weight_fraction_leaf: json-type(0.0)

├──
max_features: json-type(null)

├──
max_leaf_nodes: json-type(null)

├──
random_state: json-type(null)

├──
min_impurity_decrease: json-type(0.0)

├──
class_weight: json-type(null)

├──
ccp_alpha: json-type(0.0)

├──
n_features_in_: json-type(2)

├──
n_outputs_: json-type(1)

├──
classes_: numpy.ndarray

├──
n_classes_: numpy.int64

├──
max_features_: json-type(2)

├──
tree_: sklearn.tree._tree.Tree
│
├──
attrs: builtins.dict
│ │
├──
max_depth: json-type(2)
│ │
├──
node_count: json-type(5)
│ │
├──
nodes: numpy.ndarray
│ │
└──
values: numpy.ndarray
│
├──
args: builtins.tuple
│ │
├──
content: json-type(2)
│ │
├──
content: numpy.ndarray
│ │
└──
content: json-type(1)

└──
_sklearn_version: json-type("1.3.0")

I see. I think the best solution here would be to actually handle type, as in, in the visualized tree print what the type is, rather than skipping it. I'm sure there would be other cases where we'd encounter types as well.

Gave this a first pass. So if isinstance(node, type), then print information about the type rather than skipping.

adrinjalali

This also needs a changelog.

adrinjalali · 2023-07-27T06:39:34Z

skops/io/_visualize.py

+            is_self_safe=False,
+            is_safe=False,


here we should defer to the information in the Node instead of assuming it to be unsafe all the time. the Tree constructor for instance, is safe.

Right, I agree. However the challenge I'm running into is that it's not a valid Node object, so it does not have the necessary attributes:

AttributeError: type object 'sklearn.tree._tree.Tree' has no attribute 'is_self_safe'

If I understand correctly, it's because the constructor class is directly set as a child node here.

so probably the parent when doing the walk_tree should send this info to the child.

adrinjalali · 2023-07-27T16:34:42Z

skops/io/_visualize.py

+    if "is_self_safe" in kwargs:
+        is_self_safe = kwargs["is_self_safe"]
+    if "is_safe" in kwargs:
+        is_safe = kwargs["is_safe"]


I think it'd make sense to pass these as actual args rather than **kwargs, and they'd need to be added to the docstring

skops/io/_visualize.py

adrinjalali

This looks quite nice, thank you @reidjohnson

BenjaminBossan

Hmm, the extension of the walk_tree API to include is_self_safe and is_safe made me skeptical, as it looks like we have a leaky abstraction here. I tried to understand why that is necessary and I think I found a deeper underlying issue.

I think the correct solution right now is that we should not visualize the constructor. It is not an attribute of the object, just a utility class we add to construct it, so its presence is not really needed. Therefore, I think we should actually just skip the node info on the constructor.

This should be perfectly safe with the current code, since we only have two examples where we should have type(node) is type, namely TreeNode and SGDNode, where we ensure that the constructor is indeed safe. However, this may of course change in the future. Therefore, we could do a check that the type is either a Tree or in ALLOWED_SGD_LOSSES.

If we make this change, then we can roll back the changes in the signature of walk_tree and avoid the leaky abstraction.

As a more long term solution, I think we might want to consider something like this: the children of a ReduceNode should not directly contain the constructor or any type. Instead, we should wrap the constructor into a TypeNode, which is capable of determining if the constructor is safe or not.

If we make that change, the walk_tree function would, if it encounters a ReduceNode, just rely on ReduceNode.is_safe, which under the hood calls is_safe on the TypeNode.

WDYT @adrinjalali?

reidjohnson · 2023-07-28T19:17:04Z

@BenjaminBossan Thanks for the comments. Updated with wrapping constructor into a TypeNode; reverted changes to walk_tree.

adrinjalali

This is much nicer, but we should handle this change in protocol the same way it's done via the files in the io.old folder.

adrinjalali · 2023-07-30T09:35:20Z

skops/io/_sklearn.py

+            "constructor": TypeNode(
+                {
+                    "__class__": constructor.__name__,
+                    "__module__": constructor.__module__,


__module__ doesn't always exist, we should use get_module instead.

BenjaminBossan · 2023-07-31T14:38:41Z

The failing test seems to be that in the latest sklearn version, they added a new parameter:

├── monotonic_cst: json-type(null)

BenjaminBossan

Thanks a lot, this LGTM.

I think the codecov complaints are false positives, so if @adrinjalali approves of this final version, we can merge.

adrinjalali

Thank you so much for the PR @reidjohnson , and thank you for being patient with our reviews.

reidjohnson · 2023-08-01T16:25:05Z

Thank you for the reviews, have been happy to contribute.

Fix DecisionTree Visualization

4b87c84

Return empty if node is a constructor.

reidjohnson mentioned this pull request Jul 24, 2023

ENH: Quantile Forest Support #384

Merged

Update visualization tests

1b0a86a

Swap LogisticRegression or RandomForestRegressor

adrinjalali reviewed Jul 25, 2023

View reviewed changes

reidjohnson added 3 commits July 26, 2023 12:36

Visualize type object

52e5efa

Add visualize tests for decision trees

1aec5e9

Revert changes

3cbf6e5

adrinjalali reviewed Jul 27, 2023

View reviewed changes

Pass trust info to child

a16bc7c

adrinjalali reviewed Jul 27, 2023

View reviewed changes

Pass as explicit kwargs and update docstring

369c206

adrinjalali reviewed Jul 27, 2023

View reviewed changes

skops/io/_visualize.py Show resolved Hide resolved

Add changelog

797bf1a

reidjohnson changed the title ~~Fix Decision Tree Visualization~~ FIX: Decision Tree Visualization Jul 28, 2023

reidjohnson marked this pull request as ready for review July 28, 2023 01:28

adrinjalali approved these changes Jul 28, 2023

View reviewed changes

adrinjalali requested a review from BenjaminBossan July 28, 2023 12:43

BenjaminBossan requested changes Jul 28, 2023

View reviewed changes

Wrap constructor in TypeNode

316da5c

Update DT unit test names

328cd52

adrinjalali reviewed Jul 30, 2023

View reviewed changes

Use get_module

5d55e61

Fix DT unit test for sklearn 1.4

aa52598

BenjaminBossan approved these changes Aug 1, 2023

View reviewed changes

adrinjalali approved these changes Aug 1, 2023

View reviewed changes

adrinjalali merged commit e8523d2 into skops-dev:main Aug 1, 2023
14 of 15 checks passed

reidjohnson deleted the fix-decisiontree-visualize branch August 1, 2023 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Decision Tree Visualization #386

FIX: Decision Tree Visualization #386

reidjohnson commented Jul 24, 2023 •

edited

Loading

adrinjalali left a comment

adrinjalali Jul 25, 2023

reidjohnson Jul 25, 2023

adrinjalali Jul 26, 2023

reidjohnson Jul 26, 2023

adrinjalali Jul 25, 2023

reidjohnson Jul 25, 2023

adrinjalali Jul 26, 2023

reidjohnson Jul 26, 2023

adrinjalali left a comment

adrinjalali Jul 27, 2023

reidjohnson Jul 27, 2023 •

edited

Loading

adrinjalali Jul 27, 2023

adrinjalali Jul 27, 2023

adrinjalali left a comment

BenjaminBossan left a comment

reidjohnson commented Jul 28, 2023

adrinjalali left a comment

adrinjalali Jul 30, 2023

BenjaminBossan commented Jul 31, 2023

BenjaminBossan left a comment

adrinjalali left a comment

reidjohnson commented Aug 1, 2023

		("clf", LogisticRegression(random_state=0, solver="liblinear")),
		("clf", RandomForestRegressor(random_state=0)),

FIX: Decision Tree Visualization #386

FIX: Decision Tree Visualization #386

Conversation

reidjohnson commented Jul 24, 2023 • edited Loading

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reidjohnson Jul 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

reidjohnson commented Jul 28, 2023

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Jul 31, 2023

BenjaminBossan left a comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

reidjohnson commented Aug 1, 2023

reidjohnson commented Jul 24, 2023 •

edited

Loading

reidjohnson Jul 27, 2023 •

edited

Loading