Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

Closed
wants to merge 13 commits into from

Conversation

NarineK
Copy link
Contributor

@NarineK NarineK commented May 25, 2020

Related to the issue: #382 asserting grad_inputs and inputs to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook.

Added forward_hook_with_return_excl_modules that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it.

Added test cases for layer and neuron use cases.

@NarineK NarineK changed the title Assert grad inputs and inputs Assert grad_inputs and inputs in DeepLift May 25, 2020
@NarineK NarineK changed the title Assert grad_inputs and inputs in DeepLift Assert grad_input and inputs in DeepLift May 25, 2020
@NarineK NarineK changed the title Assert grad_input and inputs in DeepLift Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool May 26, 2020
@NarineK NarineK requested review from orionr and vivekmig May 26, 2020 05:09
@vivekmig
Copy link
Contributor

Thanks for addressing this issue @NarineK :) ! Sorry for the long delay in taking a look at this!

I was looking into the fix, and it's possible this problem may actually affect all non-linear modules, not just MaxPool modules. Here's some sample code that demonstrates the issue for a ReLU module:

class BasicModel_MultiLayer(nn.Module):
    def __init__(self, inplace=False):
        super().__init__()
        # Linear 0 is simply identity transform
        self.linear0 = nn.Linear(3, 3)
        self.linear0.weight = nn.Parameter(torch.eye(3))
        self.linear0.bias = nn.Parameter(torch.zeros(3))
        self.linear1 = nn.Linear(3, 4)
        self.linear1.weight = nn.Parameter(torch.ones(4, 3))
        self.linear1.bias = nn.Parameter(torch.tensor([-10.0, 1.0, 1.0, 1.0]))
        self.relu = nn.ReLU(inplace=inplace)

        self.linear2 = nn.Linear(4, 2)
        self.linear2.weight = nn.Parameter(torch.ones(2, 4))
        self.linear2.bias = nn.Parameter(torch.tensor([-1.0, 1.0]))

    def forward(self, x):
        input = x
        lin0_out = self.linear0(input)
        lin1_out = self.linear1(lin0_out)
        relu_out = self.relu(lin1_out)
        lin2_out = self.linear2(relu_out)
        return lin2_out
from captum.attr import LayerDeepLift

model = BasicModel_MultiLayer()
inp = torch.tensor([[1.0, -5.0, 17.0]])

layer_dl_linear = LayerDeepLift(model, model.linear2)
layer_dl_linear.attribute(inp, target=0, attribute_to_layer_input=True)
>>> (tensor([[ 3., 13., 13., 13.]], grad_fn=<MulBackward0>),)
layer_dl_relu = LayerDeepLift(model, model.relu)
layer_dl_relu.attribute(inp, target=0)
>>> tensor([[ 0.6923, 13.0000, 13.0000, 13.0000]], grad_fn=<MulBackward0>)

We would expect the attributions for linear2's input and the Relu output to be the same as well, but they aren't.

I think the underlying issue is that because of the intermediate clone within the module, we are taking gradients with respect to a tensor that is an intermediate value "within" the module, rather than the final module output. This causes hooks on this module to still execute, even though we are not backpropagating through the entire module, technically only through the clone, and don't actually want this hook to execute. For maxpool, this causes an explicit error, since the input and output size are generally different, but in other nonlinear modules, this just results in an incorrect attribution output, essentially using multipliers corresponding to the layer input rather than the output.

I think a general fix for the issue in LayerDeepLift might be to avoid putting backward hooks on the last (/ target layer) module when attributing to layer output, so this hook doesn't potentially get executed, and we still handle in-place modules, as necessary for cloning.

Alternatively, we can generally avoid cloning in LayerDeepLift for all modules (not just maxpool) and then explicitly warning that this wouldn't work with in-place modules. What do you think?

@NarineK
Copy link
Contributor Author

NarineK commented Jun 13, 2020

Hi @vivekmig , that's a good point! In this PR I was focusing on fixing the issue with the Maxpool.
If we remove the backward hook for the relu, it works fine. Which means that we get incorrect behaviors because of the hook and as a solution the fix:

forward_hook_with_return_excl_modules=list(self.SUPPORTED_NON_LINEAR.keys())

will cover all non-linearities that we hook. Let me know what do you think.

@NarineK NarineK force-pushed the assert_grad_inputs_and_inputs branch from 4a3c6e7 to e536cbd Compare June 13, 2020 00:39
Copy link
Contributor

@vivekmig vivekmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, yeah this fixes the issue for ReLU and other non-linear modules! It might cause incorrect results with in-place modules, particularly when an in-place module directly follows the target layer, so it might be good to add some notes in documentation explaining the new limitations?

Alternatively, not putting a hook on self.module in _can_register_hook when attributing to layer output might not add limitations regarding in-place modules but requires more modifications to the current fix. Whichever you prefer is fine, thanks for fixing this :)

@@ -155,6 +155,9 @@ def _forward_layer_distributed_eval(
target_ind: TargetType = None,
additional_forward_args: Any = None,
attribute_to_layer_input: bool = False,
forward_hook_with_return_excl_modules: Union[
None, List[typing.Type[Module]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: importing Type from typing above might make these expressions a little more concise

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@NarineK NarineK force-pushed the assert_grad_inputs_and_inputs branch from 2be73d6 to 50225da Compare June 24, 2020 01:09
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@NarineK merged this pull request in 361a436.

edward-io pushed a commit to edward-io/captum that referenced this pull request Jun 30, 2020
…n issue for MaxPool (pytorch#390)

Summary:
Related to the issue: pytorch#382 asserting `grad_inputs` and `inputs` to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook.

Added `forward_hook_with_return_excl_modules` that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it.

Added test cases for layer and neuron use cases.
Pull Request resolved: pytorch#390

Reviewed By: edward-io

Differential Revision: D22197030

Pulled By: NarineK

fbshipit-source-id: e6cf712103900190f46c5c1e9051519f3eaa933f
NarineK added a commit to NarineK/captum-1 that referenced this pull request Nov 19, 2020
…n issue for MaxPool (pytorch#390)

Summary:
Related to the issue: pytorch#382 asserting `grad_inputs` and `inputs` to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook.

Added `forward_hook_with_return_excl_modules` that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it.

Added test cases for layer and neuron use cases.
Pull Request resolved: pytorch#390

Reviewed By: edward-io

Differential Revision: D22197030

Pulled By: NarineK

fbshipit-source-id: e6cf712103900190f46c5c1e9051519f3eaa933f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants