-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed number of segments bug #632
Conversation
I see that the build is failing due to the unit test also asserting that the number of shapeley values should equal that of n_segments.:
I believe the unit tests should be changed in similar style to the solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yang, can you have a look at this?
@@ -116,6 +116,7 @@ def explain( | |||
self.image_segments = self._segment_image(self.input_data, n_segments, | |||
compactness, sigma, | |||
**slic_kwargs) | |||
n_segments = np.unique(self.image_segments).size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slic returns an integer mask, so I agree with @WillemSpek that we probably should find the set of those to get n_segments using unique.
https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.slic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I made a mistake and it is a bug here. The actual n_segments
should be updated as the actual number of slices can be different from the given n_segments
. Thanks for spotting it @WillemSpek ! 👍
@@ -53,7 +53,7 @@ def test_shap_explain_image(self): | |||
n_segments = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_segments is explicitly set to 50, and only used to test against. It doesn't make a lot of sense to overwrite this value, as proposed in this PR, before it is tested against. The proposed changes would test whether the output of the method (shap_values and segments) are consistent with each other, which is a good test by itself. However, I think that the idea of this test was to check the outcome of the method against some knowledge that we have prior to running the method, like some ground truth or regression test.
It's a shame it is not explained why this value of 50 is what is expected as an outcome exactly. @geek-yang, git blame points at you ;-) (b8dd755) Could you explain this value and/or review this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 To be honest Chris I forgot why I set it to be exactly 50. But I think it is a bug given that the number of labels can be slightly different, as skimage.segmentation.slic
already says n_segments
is actually "The (approximate) number of labels".
So it should be corrected as what @WillemSpek suggests. Thanks for spotting this @cwmeijer and @WillemSpek . Shame on me 😂
PS: Oh yes, the commit was made at the first week of 2022, right after new year. You guys were all on holiday and I'm the only one working at that moment. What a piece of painful memory 😭. But anyway...my bad 😂 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding and fixing this bug @WillemSpek ! It helps a lot!
@@ -53,7 +53,7 @@ def test_shap_explain_image(self): | |||
n_segments = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 To be honest Chris I forgot why I set it to be exactly 50. But I think it is a bug given that the number of labels can be slightly different, as skimage.segmentation.slic
already says n_segments
is actually "The (approximate) number of labels".
So it should be corrected as what @WillemSpek suggests. Thanks for spotting this @cwmeijer and @WillemSpek . Shame on me 😂
PS: Oh yes, the commit was made at the first week of 2022, right after new year. You guys were all on holiday and I'm the only one working at that moment. What a piece of painful memory 😭. But anyway...my bad 😂 .
@@ -116,6 +116,7 @@ def explain( | |||
self.image_segments = self._segment_image(self.input_data, n_segments, | |||
compactness, sigma, | |||
**slic_kwargs) | |||
n_segments = np.unique(self.image_segments).size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I made a mistake and it is a bug here. The actual n_segments
should be updated as the actual number of slices can be different from the given n_segments
. Thanks for spotting it @WillemSpek ! 👍
Looking great team! Keep cranking! |
This PR fixes #631. In some cases, kernelshap would over- or (worse) underallocate memory for the shapeley values leading to issues. This PR should fix this bug.