Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper normalization of the scores? #24

Open
marco-rudolph opened this issue Mar 28, 2022 · 9 comments
Open

Improper normalization of the scores? #24

marco-rudolph opened this issue Mar 28, 2022 · 9 comments

Comments

@marco-rudolph
Copy link

In train.py, you normalize the scores according to:

test_map = [list() for p in pool_layers]
for l, p in enumerate(pool_layers):
    test_norm = torch.tensor(test_dist[l], dtype=torch.double)  # EHWx1
    test_norm-= torch.max(test_norm) # normalize likelihoods to (-Inf:0] by subtracting a constant
    test_prob = torch.exp(test_norm) # convert to probs in range [0:1]
    test_mask = test_prob.reshape(-1, height[l], width[l])
    test_mask = test_prob.reshape(-1, height[l], width[l])
    # upsample
    test_map[l] = F.interpolate(test_mask.unsqueeze(1),
        size=c.crp_size, mode='bilinear', align_corners=True).squeeze().numpy()
# score aggregation
score_map = np.zeros_like(test_map[0])
for l, p in enumerate(pool_layers):
    score_map += test_map[l]

This normalization is fine as long as it is done for only one map since this normalization function is monotonically increasing. By adding up the maps from the different layers, this makes no sense to me since the relative weighting of the score maps for aggregation (last line) depends on the test set or to be more precise on the maxima of the individual maps over the test set. Am I missing something here or is this normalization improper?

@gudovskiy
Copy link
Owner

@marco-rudolph hmm, I think you are right that max is improper for multi-scale case if we cannot use any statistics. In practice, we probably know past statistics and can assume max.

@mjack3
Copy link

mjack3 commented Mar 28, 2022

Hello guys.

I am actually working in a open source implementation of FastFlow which does what @marco-rudolph says adding up the maps from the different layers

FastFlow uses ResNet as encoder, and its maps from layers [1, 2, 3]. Attending to the paper, it says

Specifically, we sum the two-dimensional probabilities of each channel to get the final probability map 
and upsample it to the input image resolution using bilinear interpolation

You can check my implementation, I'm doing a big effort to have a open source solution as I am new on normalizing flows. I am failing when need to sum the two-dimensional probabilities of each channel. I have a variable called distribution with 3 anomaly score maps form different levels and I need to normalize them into normalized probabilities before adding. I tried different approaches (yours included) and this is which works better (being wrong) link

# Chapter 3.3: Specifically, we sum the two-dimensional probabilities of each channel 
# to get the final probability map and upsample it to the input image resolution using bilinear interpolation.
likelihood_map: Union[List[Tensor], Tensor] = []
    for likelihood in distribution:
        likelihood = torch.sum(likelihood ** 2, 1)
        likelihood = resize(likelihood, [*size], InterpolationMode.BILINEAR)
        likelihood_map.append(likelihood)

    # Chap4.7: finally take the average value
    likelihood_map = torch.cat(likelihood_map, 0)
    likelihood_map = likelihood_map.mean(0, keepdim=True)

I really would appreciate feedback

@gudovskiy
Copy link
Owner

@mjack3 they just mean that they sum log-likelihoods for each element in a latent vector, which is a simplification everyone makes i.e. we assume that each dimension is independent of others. Not sure why you square likelihood.

@mjack3
Copy link

mjack3 commented Mar 28, 2022

Distribution variable has the output of the 3 NFlows for layer 1, 2 and 3 of resnet. This are 3 vector of (256,64,64) (512,32,32) and (1024,16,16). I square the log-likelihood similar as you or @marco-rudolph does. Am I wrong? In case, what could I do or what reference could you suggest?

I am still learning about normalizing flow.

Thanks to much

@marco-rudolph
Copy link
Author

@marco-rudolph hmm, I think you are right that max is improper for multi-scale case if we cannot use any statistics. In practice, we probably know past statistics and can assume max.
This might hold in some practical cases, but cannot be assumed in the anomaly detection setting. Furthermore, max is very sensitive to the test set as scores can explode since the exponentiation can produce very large values. Using only train data would change the weighting a lot. In general, the usage of max is very sensitive to outliers.
In practice, I observed that a simple addition without weighting would worsen the mean AUPRO score by about 3% which is quite significant compared to other work.

@marco-rudolph
Copy link
Author

Distribution variable has the output of the 3 NFlows for layer 1, 2 and 3 of resnet. This are 3 vector of (256,64,64) (512,32,32) and (1024,16,16). I square the log-likelihood similar as you or @marco-rudolph does. Am I wrong? In case, what could I do or what reference could you suggest?

I am still learning about normalizing flow.

Thanks to much

I think you mix up likelihoods, which are based on the sum of squares of the output, and the output itself.

@gudovskiy
Copy link
Owner

@marco-rudolph I didn't experiment by myself, but I'd check 3 cases: 1) removing max, 2) getting max from train data, and 3) current max from test data. May be 1) << 3), but not sure about 2) vs. 3) because we just want to normalize scores to align between scales.

@alevangel
Copy link

@gudovskiy any news about that?

@gudovskiy
Copy link
Owner

@alevangel well, you can replace test with train

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants