Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

646 Implement LIME for tabular data #661

Merged
merged 15 commits into from
Dec 7, 2023
Merged

646 Implement LIME for tabular data #661

merged 15 commits into from
Dec 7, 2023

Conversation

geek-yang
Copy link
Member

@geek-yang geek-yang commented Nov 14, 2023

Add LIME tabular to the methods and provide example notebooks for both regression and classification tasks:

  • Add LIME tabular to methods
  • Adapt LIME to the dianna API
  • Example notebook for classification
  • Example notebook for regression
  • Add tests

Also fixes the inconsistent input variable names in init (https://github.com/dianna-ai/dianna/blob/add_lime_tabular/dianna/__init__.py).

Note that the method returns an array of importance scores for all input features (which means num_features is disabled). For more information about this decision, check this issue #665.

@geek-yang geek-yang linked an issue Nov 14, 2023 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@geek-yang
Copy link
Member Author

geek-yang commented Nov 21, 2023

We need a visualizer for the results. I created an issue for it #662.

@geek-yang geek-yang marked this pull request as ready for review November 23, 2023 15:38
Copy link
Contributor

@cwmeijer cwmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Yang! I still have some requests like additional tests, and also something to discuss maybe: the necessity to pass the train data to the method.
Great work, also on the notebooks!

)
return explainer.explain(
model_or_function=model_or_function,
input_tabular=input_tabular,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha input_tabular! I was already guessing what this new variable's name would be after the changes you made above

Comment on lines +9 to +36
class LIMEOnTabular(TestCase):
"""Suite of LIME tests for the tabular case."""

def test_lime_tabular_classification_correct_output_shape(self):
"""Test the output of explainer."""
training_data = np.random.random((10, 2))
input_data = np.random.random(2)
feature_names = ["feature_1", "feature_2"]
explainer = LIMETabular(training_data,
mode ='classification',
feature_names=feature_names,
class_names = ["class_1", "class_2"])
exp = explainer.explain(
run_model,
input_data,
)
assert len(exp[0]) == len(feature_names)

def test_lime_tabular_regression_correct_output_shape(self):
"""Test the output of explainer."""
training_data = np.random.random((10, 2))
input_data = np.random.random(2)
feature_names = ["feature_1", "feature_2"]
exp = dianna.explain_tabular(run_model, input_tabular=input_data, method='lime',
mode ='regression', training_data = training_data,
feature_names=feature_names, class_names=['class_1'])

assert len(exp) == len(feature_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you're only testing for shapes. It would be nice to add a test with synthetic data+model with some stupidly simple pattern to check if the method can detect it. Just like we have for timeseries RISE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding a "naive test" for it. I thought about it. But then I was thinking that we added a test for timeseries mainly because we implement those methods ourselves (referring to some papers and methods like lime segmentation, for instance, but purely implementing those methods mostly from scratch).

This LIME tabular is entirely from LIME. We simply put a wrapper around it. It is somehow tested in the original implementation of LIME:

https://github.com/marcotcr/lime/blob/master/lime/tests/test_lime_tabular.py

But I do agree that it is nice to add a test for it. Maybe let's create an issue and design a simple test for it?? This PR already gets too big...I would prefer to do it in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #669 created! 😄

dianna/methods/lime_tabular.py Outdated Show resolved Hide resolved
self.top_labels = len(class_names)

self.explainer = LimeTabularExplainer(
training_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to call LIME without training_data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments above.

Comment on lines 80 to 86
def explain(
self,
model_or_function,
input_tabular,
labels=(1, ),
num_samples=5000,
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add Typing to all public methods? Also for the return types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, well spot! Thanks Chris! Just miss those stuff. I will add them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data_instance loading is done in an unexpected section. It would be better to do this in the data loading section above.

A bit lower, you're using the variable name 'exp' which is short but ambiguous. It would be better to call it 'explanation'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem to my comments on the penguin notebook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@cwmeijer cwmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏅 🧸 🏆 🦾 🎉

@elboyran
Copy link
Contributor

elboyran commented Dec 7, 2023 via email

@geek-yang geek-yang merged commit 53188c3 into main Dec 7, 2023
15 of 18 checks passed
SS Sprint 3 - improve code quality, adding tabular funcitonality automation moved this from In progress to Done Dec 7, 2023
@geek-yang geek-yang deleted the add_lime_tabular branch December 7, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Implement tabular version of LIME (wrap existing)
3 participants