SHAP and orderedfeature minimizers #57

Basharza · 2022-07-29T22:57:48Z

An implementation of SHAP minimizer and the more general class orderedfeature minimizer, which greedily prunes feature DTs based on feature importance values in order to achieve generalizations.

abigailgold · 2022-08-01T06:37:57Z

apt/minimization/__init__.py

@@ -13,4 +13,7 @@
 It is also possible to export the generalizations as feature ranges.

 """
-from apt.minimization.minimizer import GeneralizeToRepresentative
+from .minimizer import GeneralizeToRepresentative


Please use absolute imports instead of relative

abigailgold · 2022-08-01T06:39:55Z

apt/minimization/dtimportanceminimizer.py

+    def importance_ordered_features(self):
+        return self._importance_ordered_features
+
+    def _get_ordered_features(self, estimator, encoder, X_train, y_train, numerical_features, categorical_features,


Looks like the order is only computed when calling an internal method (begins with _). Please change this to be an external method (with documentation), and if it receives x and y data better to call it fit.

This method is a hook that is called in the fit method in the superclass (OrderedFeatureMinimizer). It is a protected method that should be overriden to define a custom order to prune according during fit.

The implementation as it is allows access to the order through a property only after fit() is called, the property returns None otherwise. This property has a different name for different classes (here it is importace_ordered_features). I could add this to OrderedFeatureMinimizer as a generic property called features_order if you believe that is more suitable.

Ok, I understand. I suggest to have a consistent naming for the property. Also, at the moment the fit code does not set self._ordered_features, which I think it should (instead of setting a local variable).

abigailgold · 2022-08-01T06:40:57Z

apt/minimization/minimizer.py

@@ -66,7 +66,7 @@ def __init__(self, estimator: Union[BaseEstimator, Model] = None, target_accurac
                 encoder: Optional[Union[OrdinalEncoder, OneHotEncoder]] = None,
                 features_to_minimize: Optional[Union[np.ndarray, list]] = None,
                 train_only_features_to_minimize: Optional[bool] = True,
-                 is_regression: Optional[bool] = False):
+                 is_regression: Optional[bool] = False, accuracy_score: Callable = None):


Please add this new parameter to the class docstring, stating clearly what are the expectations from this callable (inputs and outputs).

abigailgold · 2022-08-01T06:41:46Z

apt/minimization/orderedfeatureminimizer.py

+
+
+# TODO: use mixins correctly
+class OrderedFeatureMinimizer:  # (BaseEstimator, MetaEstimatorMixin, TransformerMixin):


Please add a doctring for the class and all external (public) methods.

abigailgold · 2022-08-01T06:43:04Z

apt/minimization/orderedfeatureminimizer.py

+            X[:, indices] = cls._transform_categorical_feature(dts[feature].tree_, X[:, indices],
+                                                               generalizations_arrays[feature], depths[feature], 0)
+
+    @classmethod


I think we agreed there is no need for class methods. These can be changed to static or removed from the class to some utils file.

abigailgold · 2022-08-01T06:44:28Z

apt/minimization/orderedfeatureminimizer.py

+        return 1 + max(cls._calculate_tree_depth(dt, dt.tree_.children_left[node_id]),
+                       cls._calculate_tree_depth(dt, dt.tree_.children_right[node_id]))
+
+    def fit(self, X: pd.DataFrame, y=None):


Please add doctsring, and state the assumptions about the data (order of features etc.). Also, is it necessary that the input be a dataframe? Could we use the generic data wrappers (e.g., ArrayDataset) instead?

Please add type of y

abigailgold · 2022-08-01T06:45:26Z

apt/minimization/shapminimizer.py

+from sklearn.model_selection import train_test_split
+
+
+class ShapMinimizer(OrderedFeatureMinimizer):


Here also missing docstrings.

abigailgold · 2022-08-01T06:45:41Z

apt/minimization/shapminimizer.py

+import pandas as pd
+import numpy as np
+from typing import List, Dict, Union
+from . import OrderedFeatureMinimizer


Please use absolute import

abigailgold · 2022-08-01T06:46:49Z

apt/minimization/shapminimizer.py

+                              background_size, feature_indices: Dict[str, List[int]], random_state, nsamples):
+        """
+        Calculates global shap per feature.
+        :param nsamples:


Missing param descriptions

abigailgold · 2022-08-01T06:48:42Z

playground.py

@@ -0,0 +1,105 @@
+import numpy as np


Please remove this file (which includes the Benchmark class). Instead add some unit tests to the tests directory with appropriate asserts (see examples in test_minimizer.py)

abigailgold · 2022-08-01T06:49:41Z

apt/minimization/orderedfeatureminimizer.py

+from sklearn.tree._tree import Tree
+
+
+# TODO: use mixins correctly


Please remove TODOs and unused params (unless you still plan to add this).

… refactored.

…ed assumptions. * Added general important documentation (not everything is documented). * Explainer is commented out. Instead there is a placeholder so we could continue implementation.

* Added implementations * Added splits in fit method

…ethodology, and a little bug in catergorical feature transformation.

Implemented generalizations partially. (some helpers). - not tested

Models now depend on random state to create reproducible results.

* Cleaned pruning code and moved to a class method. * Pruning now only calls transform once per iteration. It saves data from previous iteration to reverse a generalization. * transform() now depends on generalizations.

* Run experiment on np.arange(0, .8, 0.1). The original minimizer seems not to work past 0.8 target accuracy (or takes tooo long).

* Fixed shapminimizer to only take true label values into account * Added MLP to comparison.ipynb

* level can reach 0 in pruning * discard category is out of the stopping condition * discard category is given by split not by majority

* Implemented DT Importance minimizer * Minimizers now support different accuracy metrics (supplied at init) * SHAP minimizer now makes use of kmeans instead of random samples.

abigailgold reviewed Aug 1, 2022

View reviewed changes

Basharza and others added 19 commits January 2, 2023 13:40

Added shap minimizer module to apt/minimization

657fd1c

Renamed shap-minimizer

7641bad

shapminimizer can now get each cateforical feature's columns. Will be…

5c49eb4

… refactored.

* Refactored fit and reorganized. Changed implementation and document…

b469c75

…ed assumptions. * Added general important documentation (not everything is documented). * Explainer is commented out. Instead there is a placeholder so we could continue implementation.

* Fixed bugs

83aed30

* Added implementations * Added splits in fit method

*Intermediate commit*

6c5cc53

Implemented pruning and inner transform, there's a slight change in m…

75466d4

…ethodology, and a little bug in catergorical feature transformation.

Fixed bug, ran experiments with shap-sorted features

d40b639

Refactored shapminimizer into orderedfeatureminimizer with hook.

e392e7a

Implemented generalizations partially. (some helpers). - not tested

Fixed bug in transform, fully implemented generalizations

6375c57

Fixed bug in transform, fully implemented generalizations

9788d1f

Transform now returns unencoded data.

f5ff801

Added experiment notebook (comparison.ipynb).

daa56b5

Models now depend on random state to create reproducible results.

general cleaning and fixes

2fe9c87

* Cleaned pruning code and moved to a class method. * Pruning now only calls transform once per iteration. It saves data from previous iteration to reverse a generalization. * transform() now depends on generalizations.

general cleaning and experiment run

3d7ebd6

* Run experiment on np.arange(0, .8, 0.1). The original minimizer seems not to work past 0.8 target accuracy (or takes tooo long).

General changes and fixes

16d9dab

* Fixed shapminimizer to only take true label values into account * Added MLP to comparison.ipynb

comments and cleaning

5083788

get_generalizations bug fixes

30311ab

* level can reach 0 in pruning * discard category is out of the stopping condition * discard category is given by split not by majority

Features and Changes

1ec7e76

* Implemented DT Importance minimizer * Minimizers now support different accuracy metrics (supplied at init) * SHAP minimizer now makes use of kmeans instead of random samples.

Features and Changes

1735879

* Implemented DT Importance minimizer * Minimizers now support different accuracy metrics (supplied at init) * SHAP minimizer now makes use of kmeans instead of random samples.

MustafaKeblawi force-pushed the main branch from 4c34eb0 to 1735879 Compare January 2, 2023 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHAP and orderedfeature minimizers #57

SHAP and orderedfeature minimizers #57

Basharza commented Jul 29, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

MustafaKeblawi Nov 19, 2022

abigailgold Nov 21, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022

abigailgold Aug 1, 2022



		# TODO: use mixins correctly
		class OrderedFeatureMinimizer: # (BaseEstimator, MetaEstimatorMixin, TransformerMixin):

		from sklearn.model_selection import train_test_split


		class ShapMinimizer(OrderedFeatureMinimizer):

		from sklearn.tree._tree import Tree


		# TODO: use mixins correctly

SHAP and orderedfeature minimizers #57

Are you sure you want to change the base?

SHAP and orderedfeature minimizers #57

Conversation

Basharza commented Jul 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment