Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncbn with "channel_last=True" produce wrong result when feature_num is not pow-of-two #1768

Open
Zehaos opened this issue Jan 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Zehaos
Copy link

Zehaos commented Jan 14, 2024

Describe the Bug
When feature_num is not pow-of-two, apex.parallel.SyncBatchNorm will produce wrong result.

I test it step by step, and found it produce wrong mean and var when feature_h and feature_w is large enough (see the minimal reproduce code below).

Minimal Steps/Code to Reproduce the Bug

when feature h, feature w is small, it produce correct result.

import torch
import syncbn
feature_size = 65 # not pow-of-two
feature_h = 10
feature_w = 10
# when feature_h, feature_w is small, it produce correct mean and var
input = torch.rand(1, feature_size, feature_h, feature_w)
input_clast = input.permute([0,2,3,1]).contiguous()
var, mean = torch.var_mean(input_clast, dim=[0,1,2], unbiased=False)
mean_apex, var_apex = syncbn.welford_mean_var_c_last(input_clast)
torch.allclose(mean, mean_apex) # it is true

when feature h, feature w is large, it produce wrong result.

import torch
import syncbn
feature_size = 65 # not pow-of-two
feature_h = 100
feature_w = 100
# when feature_h, feature_w is large, it produce wrong mean and var
input = torch.rand(1, feature_size, feature_h, feature_w)
input_clast = input.permute([0,2,3,1]).contiguous()
var, mean = torch.var_mean(input_clast, dim=[0,1,2], unbiased=False)
mean_apex, var_apex = syncbn.welford_mean_var_c_last(input_clast)
torch.allclose(mean, mean_apex) # it is False

Expected Behavior

Environment

ngc_23.11

@Zehaos Zehaos added the bug Something isn't working label Jan 14, 2024
@Zehaos
Copy link
Author

Zehaos commented Jan 14, 2024

cc @jjsjann123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant