-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390
Conversation
grad_inputs
and inputs
in DeepLift
grad_inputs
and inputs
in DeepLift
Thanks for addressing this issue @NarineK :) ! Sorry for the long delay in taking a look at this! I was looking into the fix, and it's possible this problem may actually affect all non-linear modules, not just MaxPool modules. Here's some sample code that demonstrates the issue for a ReLU module:
We would expect the attributions for linear2's input and the Relu output to be the same as well, but they aren't. I think the underlying issue is that because of the intermediate clone within the module, we are taking gradients with respect to a tensor that is an intermediate value "within" the module, rather than the final module output. This causes hooks on this module to still execute, even though we are not backpropagating through the entire module, technically only through the clone, and don't actually want this hook to execute. For maxpool, this causes an explicit error, since the input and output size are generally different, but in other nonlinear modules, this just results in an incorrect attribution output, essentially using multipliers corresponding to the layer input rather than the output. I think a general fix for the issue in LayerDeepLift might be to avoid putting backward hooks on the last (/ target layer) module when attributing to layer output, so this hook doesn't potentially get executed, and we still handle in-place modules, as necessary for cloning. Alternatively, we can generally avoid cloning in LayerDeepLift for all modules (not just maxpool) and then explicitly warning that this wouldn't work with in-place modules. What do you think? |
Hi @vivekmig , that's a good point! In this PR I was focusing on fixing the issue with the Maxpool.
will cover all non-linearities that we hook. Let me know what do you think. |
4a3c6e7
to
e536cbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, yeah this fixes the issue for ReLU and other non-linear modules! It might cause incorrect results with in-place modules, particularly when an in-place module directly follows the target layer, so it might be good to add some notes in documentation explaining the new limitations?
Alternatively, not putting a hook on self.module in _can_register_hook
when attributing to layer output might not add limitations regarding in-place modules but requires more modifications to the current fix. Whichever you prefer is fine, thanks for fixing this :)
@@ -155,6 +155,9 @@ def _forward_layer_distributed_eval( | |||
target_ind: TargetType = None, | |||
additional_forward_args: Any = None, | |||
attribute_to_layer_input: bool = False, | |||
forward_hook_with_return_excl_modules: Union[ | |||
None, List[typing.Type[Module]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: importing Type from typing above might make these expressions a little more concise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
2be73d6
to
50225da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…n issue for MaxPool (pytorch#390) Summary: Related to the issue: pytorch#382 asserting `grad_inputs` and `inputs` to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook. Added `forward_hook_with_return_excl_modules` that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it. Added test cases for layer and neuron use cases. Pull Request resolved: pytorch#390 Reviewed By: edward-io Differential Revision: D22197030 Pulled By: NarineK fbshipit-source-id: e6cf712103900190f46c5c1e9051519f3eaa933f
…n issue for MaxPool (pytorch#390) Summary: Related to the issue: pytorch#382 asserting `grad_inputs` and `inputs` to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook. Added `forward_hook_with_return_excl_modules` that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it. Added test cases for layer and neuron use cases. Pull Request resolved: pytorch#390 Reviewed By: edward-io Differential Revision: D22197030 Pulled By: NarineK fbshipit-source-id: e6cf712103900190f46c5c1e9051519f3eaa933f
Related to the issue: #382 asserting
grad_inputs
andinputs
to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook.Added
forward_hook_with_return_excl_modules
that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it.Added test cases for layer and neuron use cases.