Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

NarineK · 2020-05-25T00:47:48Z

Related to the issue: #382 asserting grad_inputs and inputs to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook.

Added forward_hook_with_return_excl_modules that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it.

Added test cases for layer and neuron use cases.

vivekmig · 2020-06-10T19:50:51Z

Thanks for addressing this issue @NarineK :) ! Sorry for the long delay in taking a look at this!

I was looking into the fix, and it's possible this problem may actually affect all non-linear modules, not just MaxPool modules. Here's some sample code that demonstrates the issue for a ReLU module:

class BasicModel_MultiLayer(nn.Module):
    def __init__(self, inplace=False):
        super().__init__()
        # Linear 0 is simply identity transform
        self.linear0 = nn.Linear(3, 3)
        self.linear0.weight = nn.Parameter(torch.eye(3))
        self.linear0.bias = nn.Parameter(torch.zeros(3))
        self.linear1 = nn.Linear(3, 4)
        self.linear1.weight = nn.Parameter(torch.ones(4, 3))
        self.linear1.bias = nn.Parameter(torch.tensor([-10.0, 1.0, 1.0, 1.0]))
        self.relu = nn.ReLU(inplace=inplace)

        self.linear2 = nn.Linear(4, 2)
        self.linear2.weight = nn.Parameter(torch.ones(2, 4))
        self.linear2.bias = nn.Parameter(torch.tensor([-1.0, 1.0]))

    def forward(self, x):
        input = x
        lin0_out = self.linear0(input)
        lin1_out = self.linear1(lin0_out)
        relu_out = self.relu(lin1_out)
        lin2_out = self.linear2(relu_out)
        return lin2_out
from captum.attr import LayerDeepLift

model = BasicModel_MultiLayer()
inp = torch.tensor([[1.0, -5.0, 17.0]])

layer_dl_linear = LayerDeepLift(model, model.linear2)
layer_dl_linear.attribute(inp, target=0, attribute_to_layer_input=True)
>>> (tensor([[ 3., 13., 13., 13.]], grad_fn=<MulBackward0>),)
layer_dl_relu = LayerDeepLift(model, model.relu)
layer_dl_relu.attribute(inp, target=0)
>>> tensor([[ 0.6923, 13.0000, 13.0000, 13.0000]], grad_fn=<MulBackward0>)

We would expect the attributions for linear2's input and the Relu output to be the same as well, but they aren't.

I think the underlying issue is that because of the intermediate clone within the module, we are taking gradients with respect to a tensor that is an intermediate value "within" the module, rather than the final module output. This causes hooks on this module to still execute, even though we are not backpropagating through the entire module, technically only through the clone, and don't actually want this hook to execute. For maxpool, this causes an explicit error, since the input and output size are generally different, but in other nonlinear modules, this just results in an incorrect attribution output, essentially using multipliers corresponding to the layer input rather than the output.

I think a general fix for the issue in LayerDeepLift might be to avoid putting backward hooks on the last (/ target layer) module when attributing to layer output, so this hook doesn't potentially get executed, and we still handle in-place modules, as necessary for cloning.

Alternatively, we can generally avoid cloning in LayerDeepLift for all modules (not just maxpool) and then explicitly warning that this wouldn't work with in-place modules. What do you think?

NarineK · 2020-06-13T00:16:30Z

Hi @vivekmig , that's a good point! In this PR I was focusing on fixing the issue with the Maxpool.
If we remove the backward hook for the relu, it works fine. Which means that we get incorrect behaviors because of the hook and as a solution the fix:

forward_hook_with_return_excl_modules=list(self.SUPPORTED_NON_LINEAR.keys())

will cover all non-linearities that we hook. Let me know what do you think.

vivekmig

Sounds good, yeah this fixes the issue for ReLU and other non-linear modules! It might cause incorrect results with in-place modules, particularly when an in-place module directly follows the target layer, so it might be good to add some notes in documentation explaining the new limitations?

Alternatively, not putting a hook on self.module in _can_register_hook when attributing to layer output might not add limitations regarding in-place modules but requires more modifications to the current fix. Whichever you prefer is fine, thanks for fixing this :)

vivekmig · 2020-06-15T13:41:55Z

captum/_utils/gradient.py

@@ -155,6 +155,9 @@ def _forward_layer_distributed_eval(
    target_ind: TargetType = None,
    additional_forward_args: Any = None,
    attribute_to_layer_input: bool = False,
+    forward_hook_with_return_excl_modules: Union[
+        None, List[typing.Type[Module]]


nit: importing Type from typing above might make these expressions a little more concise

facebook-github-bot

@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@NarineK has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-06-24T20:22:16Z

@NarineK merged this pull request in 361a436.

…n issue for MaxPool (pytorch#390) Summary: Related to the issue: pytorch#382 asserting `grad_inputs` and `inputs` to have the same shape. More description about the workaround and why the issue happens can be found in the description of the assert. The error occurs when we attribute to the outputs of the layer because of the input or output tensor returned in the forward hook. Added `forward_hook_with_return_excl_modules` that contains the list of modules for which we don't want to have a return in the forward_hook. This is only used in DeepLift and can be used for any algorithm that attributes to maxpool and at the same time has a backward hook set on it. Added test cases for layer and neuron use cases. Pull Request resolved: pytorch#390 Reviewed By: edward-io Differential Revision: D22197030 Pulled By: NarineK fbshipit-source-id: e6cf712103900190f46c5c1e9051519f3eaa933f

NarineK changed the title ~~Assert grad inputs and inputs~~ Assert grad_inputs and inputs in DeepLift May 25, 2020

NarineK changed the title ~~Assert grad_inputs and inputs in DeepLift~~ Assert grad_input and inputs in DeepLift May 25, 2020

NarineK changed the title ~~Assert grad_input and inputs in DeepLift~~ Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool May 26, 2020

NarineK requested review from orionr and vivekmig May 26, 2020 05:09

NarineK mentioned this pull request May 28, 2020

LayerDeepLift fails when used on a MaxPooling layer? #382

Closed

NarineK force-pushed the assert_grad_inputs_and_inputs branch from 4a3c6e7 to e536cbd Compare June 13, 2020 00:39

vivekmig approved these changes Jun 15, 2020

View reviewed changes

facebook-github-bot reviewed Jun 23, 2020

View reviewed changes

NarineK added 13 commits June 23, 2020 18:06

Add assert check on grad_inputs and inputs with test cases

434fdae

Add test cases for deeplift

def7e1d

fix small issues

3a6b635

change assert to AssertionError

a64c66f

fix error

cd2ae05

add backslash

fd44980

Fix formatting

8e151bb

Add patch for deeplift maxpool

afd11af

Fix hypehint error

7d13895

fix import

5fe12e7

Fix formatting error

e4ccf63

Fix typehint error

ba1101d

Add all non-linearities

50225da

NarineK force-pushed the assert_grad_inputs_and_inputs branch from 2be73d6 to 50225da Compare June 24, 2020 01:09

facebook-github-bot reviewed Jun 24, 2020

View reviewed changes

facebook-github-bot closed this in 361a436 Jun 24, 2020

facebook-github-bot added the Merged label Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

NarineK commented May 25, 2020 •

edited

Loading

vivekmig commented Jun 10, 2020

NarineK commented Jun 13, 2020

vivekmig left a comment

vivekmig Jun 15, 2020

facebook-github-bot left a comment

facebook-github-bot left a comment

facebook-github-bot commented Jun 24, 2020

Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

Assert grad_input and inputs in DeepLift and fix the layer attribution issue for MaxPool #390

Conversation

NarineK commented May 25, 2020 • edited Loading

vivekmig commented Jun 10, 2020

NarineK commented Jun 13, 2020

vivekmig left a comment

Choose a reason for hiding this comment

vivekmig Jun 15, 2020

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 24, 2020

NarineK commented May 25, 2020 •

edited

Loading