You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If one uses the option kd-tree and tries to generate counterfactuals for a dataset with purely categorical columns, DiCE will always show the same set of counterfactuals. The screenshot shows an excerpt of the DiCE_model_agnostic_CFs notebook. You can see that the indices of the counterfactuals are the same even though the two query instances are quite different. The only thing I changed about the notebook was removing the two numerical columns age and hours_per_week from the dataset.
This generates a one-hot-encoded version of the query instance. However, the order of the columns does not match the order of the columns for data in the KD-tree. Sklearn treats the data for the KD-tree as an array, not as a dataframe. Therefore, the switched order goes unnoticed when entering the query instance into the KD-tree here:
But sklearn only sees the array format of the dataframe which is always the same, since the one-hot-encoded columns with a 1 are set first in the KD_query_instance. I will probably add a pull request to this issue soon that should fix the problem.
The text was updated successfully, but these errors were encountered:
If one uses the option
kd-tree
and tries to generate counterfactuals for a dataset with purely categorical columns, DiCE will always show the same set of counterfactuals. The screenshot shows an excerpt of the DiCE_model_agnostic_CFs notebook. You can see that the indices of the counterfactuals are the same even though the two query instances are quite different. The only thing I changed about the notebook was removing the two numerical columnsage
andhours_per_week
from the dataset.I believe the issue originates from these lines:
DiCE/dice_ml/explainer_interfaces/dice_KD.py
Lines 225 to 228 in 6b5a521
This generates a one-hot-encoded version of the query instance. However, the order of the columns does not match the order of the columns for data in the KD-tree. Sklearn treats the data for the KD-tree as an array, not as a dataframe. Therefore, the switched order goes unnoticed when entering the query instance into the KD-tree here:
DiCE/dice_ml/explainer_interfaces/dice_KD.py
Line 164 in 6b5a521
But sklearn only sees the array format of the dataframe which is always the same, since the one-hot-encoded columns with a 1 are set first in the
KD_query_instance
. I will probably add a pull request to this issue soon that should fix the problem.The text was updated successfully, but these errors were encountered: