Avoid calling flatten
in TracInCP.self_influence
for efficiency
#1087
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
We are currently unnecessarily calling flatten in TracInCP.self_influence at https://www.internalfb.com/code/fbsource/[90a997b1774c]/fbcode/pytorch/captum/captum/influence/_core/tracincp.py?lines=1071-1073
In particular, the purpose of those 3 lines is to sum over all dimensions in layer_jacobian ** 2 besides the 0-th dimension, where the 0-th dimension is the batch dimension, and the remaining dimensions correspond to a specific parameter's dimensions (layer_jacobians stores the per-example jacobian of a given parameter, so that its dimension is 1-greater than the dimension of the given parameter, due to the need to account for different examples in the batch). However, we currently do that by flattening all dimensions of layer_jacobians except the 0-th (batch) dimension, then summing across all dimensions except the 0-th dimension (i.e. the 1st dimension). Flattening may sometimes incur overhead if data needs to be copied (when a view cannot be returned). We cannot guarantee that a view can be returned since the parameters can be arbitrary. Thus, there is always the possibility of unnecessary overhead.
Therefore, for this diff, you could address that inefficiency by making TracInCP.self_influence not use flatten. One option is to explicitly create layer_jacobian ** 2, and then sum over all dimensions except the 0-th dimension. Another option is to leverage torch.linalg.norm, choosing the dim option to consider the norm across all dimensions except the 0-th dimension, and then taking the square.
No new tests are needed. It would be interesting to benchmark the efficiency gains from making this change, but this is just a nice-to-have, as we already know that both above alternatives are better than using flatten.
Reviewed By: 99warriors
Differential Revision: D41140914