DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

lange-martin · 2022-06-01T06:32:42Z

If one uses the option kd-tree and tries to generate counterfactuals for a dataset with purely categorical columns, DiCE will always show the same set of counterfactuals. The screenshot shows an excerpt of the DiCE_model_agnostic_CFs notebook. You can see that the indices of the counterfactuals are the same even though the two query instances are quite different. The only thing I changed about the notebook was removing the two numerical columns age and hours_per_week from the dataset.

I believe the issue originates from these lines:

DiCE/dice_ml/explainer_interfaces/dice_KD.py

Lines 225 to 228 in 6b5a521

    
           query_instance_df_dummies = pd.get_dummies(query_instance_orig) 
        
           for col in pd.get_dummies(data_df_copy[self.data_interface.feature_names]).columns: 
        
               if col not in query_instance_df_dummies.columns: 
        
                   query_instance_df_dummies[col] = 0

This generates a one-hot-encoded version of the query instance. However, the order of the columns does not match the order of the columns for data in the KD-tree. Sklearn treats the data for the KD-tree as an array, not as a dataframe. Therefore, the switched order goes unnoticed when entering the query instance into the KD-tree here:

DiCE/dice_ml/explainer_interfaces/dice_KD.py

Line 164 in 6b5a521

KD_tree_output = self.KD_tree.query(KD_query_instance, num_queries)

But sklearn only sees the array format of the dataframe which is always the same, since the one-hot-encoded columns with a 1 are set first in the KD_query_instance. I will probably add a pull request to this issue soon that should fix the problem.

The text was updated successfully, but these errors were encountered:

Signed-off-by: Martin Lange <ml_ks@web.de>

lange-martin added a commit to lange-martin/DiCE that referenced this issue Jun 1, 2022

fix order of columns in query instance (fix interpretml#303)

48a050f

lange-martin mentioned this issue Jun 1, 2022

Fix #303 - Reorder columns in query instance #304

Merged

lange-martin added a commit to lange-martin/DiCE that referenced this issue Jun 3, 2022

fix order of columns in query instance (fix interpretml#303)

5b6f5f2

Signed-off-by: Martin Lange <ml_ks@web.de>

amit-sharma closed this as completed in #304 Jun 27, 2022

amit-sharma pushed a commit that referenced this issue Jun 27, 2022

fix order of columns in query instance for kdtree (fix #303) (#304)

908f3e8

Signed-off-by: Martin Lange <ml_ks@web.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

lange-martin commented Jun 1, 2022

DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

DiCE with option "kd-tree" always generates the same counterfactuals for categorical data #303

Comments

lange-martin commented Jun 1, 2022