Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed number of segments bug #632

Merged
merged 4 commits into from
Aug 10, 2023
Merged

fixed number of segments bug #632

merged 4 commits into from
Aug 10, 2023

Conversation

WillemSpek
Copy link
Collaborator

This PR fixes #631. In some cases, kernelshap would over- or (worse) underallocate memory for the shapeley values leading to issues. This PR should fix this bug.

@WillemSpek WillemSpek added the bug Something isn't working label Aug 7, 2023
@WillemSpek WillemSpek self-assigned this Aug 7, 2023
@WillemSpek
Copy link
Collaborator Author

I see that the build is failing due to the unit test also asserting that the number of shapeley values should equal that of n_segments.:

>>> assert shap_values[0].shape == np.zeros((1, n_segments)).shape

I believe the unit tests should be changed in similar style to the solution.

Copy link
Contributor

@cwmeijer cwmeijer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yang, can you have a look at this?

@@ -116,6 +116,7 @@ def explain(
self.image_segments = self._segment_image(self.input_data, n_segments,
compactness, sigma,
**slic_kwargs)
n_segments = np.unique(self.image_segments).size
Copy link
Contributor

@cwmeijer cwmeijer Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slic returns an integer mask, so I agree with @WillemSpek that we probably should find the set of those to get n_segments using unique.
https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.slic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I made a mistake and it is a bug here. The actual n_segments should be updated as the actual number of slices can be different from the given n_segments. Thanks for spotting it @WillemSpek ! 👍

@@ -53,7 +53,7 @@ def test_shap_explain_image(self):
n_segments = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_segments is explicitly set to 50, and only used to test against. It doesn't make a lot of sense to overwrite this value, as proposed in this PR, before it is tested against. The proposed changes would test whether the output of the method (shap_values and segments) are consistent with each other, which is a good test by itself. However, I think that the idea of this test was to check the outcome of the method against some knowledge that we have prior to running the method, like some ground truth or regression test.
It's a shame it is not explained why this value of 50 is what is expected as an outcome exactly. @geek-yang, git blame points at you ;-) (b8dd755) Could you explain this value and/or review this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 To be honest Chris I forgot why I set it to be exactly 50. But I think it is a bug given that the number of labels can be slightly different, as skimage.segmentation.slic already says n_segments is actually "The (approximate) number of labels".

So it should be corrected as what @WillemSpek suggests. Thanks for spotting this @cwmeijer and @WillemSpek . Shame on me 😂

PS: Oh yes, the commit was made at the first week of 2022, right after new year. You guys were all on holiday and I'm the only one working at that moment. What a piece of painful memory 😭. But anyway...my bad 😂 .

Copy link
Member

@geek-yang geek-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing this bug @WillemSpek ! It helps a lot!

@@ -53,7 +53,7 @@ def test_shap_explain_image(self):
n_segments = 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 To be honest Chris I forgot why I set it to be exactly 50. But I think it is a bug given that the number of labels can be slightly different, as skimage.segmentation.slic already says n_segments is actually "The (approximate) number of labels".

So it should be corrected as what @WillemSpek suggests. Thanks for spotting this @cwmeijer and @WillemSpek . Shame on me 😂

PS: Oh yes, the commit was made at the first week of 2022, right after new year. You guys were all on holiday and I'm the only one working at that moment. What a piece of painful memory 😭. But anyway...my bad 😂 .

@@ -116,6 +116,7 @@ def explain(
self.image_segments = self._segment_image(self.input_data, n_segments,
compactness, sigma,
**slic_kwargs)
n_segments = np.unique(self.image_segments).size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I made a mistake and it is a bug here. The actual n_segments should be updated as the actual number of slices can be different from the given n_segments. Thanks for spotting it @WillemSpek ! 👍

@yang
Copy link

yang commented Aug 9, 2023

@yang, can you have a look at this?

Looking great team! Keep cranking!

@WillemSpek WillemSpek merged commit 3284afe into main Aug 10, 2023
14 of 18 checks passed
@geek-yang geek-yang deleted the 631-fix_segments branch August 10, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KernelSHAP doesn't yield the correct amount of Shapeley values
4 participants