Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference time #8

Open
sangkyuleeKOR opened this issue Feb 2, 2021 · 25 comments
Open

Inference time #8

sangkyuleeKOR opened this issue Feb 2, 2021 · 25 comments

Comments

@sangkyuleeKOR
Copy link

Thanks for your effort!
I have a question about PaDiM.
I saw the average inference time with R18-Rd100 took 0.23sec in the paper.
But in test phase, calculating train/test image vector's mahalanobis distance takes about 9sec when I use gpu.
Any comments??? Thanks!

@xiahaifeng1995
Copy link
Owner

Sorry, the implementation of Mahalanobis distance is not elegant and takes up most of the inference time, which may still have room for optimization.

@sangkyuleeKOR
Copy link
Author

thanks for reply! I think this way is faster that Instead of calcuating vectors with for loop, get mahalanobis distance with matrix multiply!

@DeepKnowledge1
Copy link

Do you think that could be improved by multiprocessing or joblib packages?

@okokchoi
Copy link

okokchoi commented Mar 25, 2021

Do you mean

        for i in range(H * W):
        mean = train_outputs[0][:, i]
        conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
        dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
        dist_list.append(dist)

This part takes a lot of time, right?

@DeepKnowledge1
Copy link

DeepKnowledge1 commented Mar 25, 2021

@xiahaifeng1995 , @okokchoi , you could also move the following into the training and save it with mean.

conv_inv = np.linalg.inv(train_outputs[1][:, :, i])

So, in the training part:

train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

I replace the following :
dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]

with :
import scipy.spatial.distance as SSD

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)

@okokchoi
Copy link

Thanks a lot for your reply!
I'm really sorry but, I think something wrong with the code which I modificate

            for i in range(H * W):
                mean = train_outputs[0][:, i]
                conv_inv = train_outputs[1][:, :, i]
                dist = cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
                dist_list.append(dist)
<Error>
Traceback (most recent call last):
  File "main_test.py", line 301, in <module>
    main()
  File "main_test.py", line 170, in main
    dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
ValueError: axes don't match array

dist value has the same length, but something wrong with dist_list

@DeepKnowledge1
Copy link

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

@okokchoi
Copy link

okokchoi commented Mar 25, 2021

I solve the problem that I just load pkl file for the non-modified version.
I have a question @DeepKnowledge1, is the modified version faster than the original one?
(Anyway, Thank you for your favor :) You are the best!

@DeepKnowledge1
Copy link

I think so, please try it and share your findings

@okokchoi
Copy link

Ok I will 👍

@ingbeeedd
Copy link

@DeepKnowledge1 @okokchoi

I think it's pretty much the same. As well as the size of the feature map, below codes are heavy

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))

Is there a way to turn it in parallel?

@ingbeeedd
Copy link

Improved 3.5 times through real process multiprocessing

@fryegg
Copy link

fryegg commented May 27, 2021

Improved 3.5 times through real process multiprocessing

Awesome! Did you use the multiprocessing module in Pytorch?

@GreatScherzo
Copy link

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ).
The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented?
Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

@fryegg
Copy link

fryegg commented May 28, 2021

@GreatScherzo That's what I want to do to change the loop to matrix calculation. I will apply some modifications to this.

@ingbeeedd
Copy link

ingbeeedd commented May 28, 2021

@fryegg @GreatScherzo
I have written as follows.

manager = multiprocessing.Manager()
cpu_core = 8
dist_list = manager.list()
for number in range(cpu_core):
    dist_list.append(manager.list())

def calculate_distance(number, start, end, train_outputs, embedding_vectors):
    global dist_list
    for i in range(start, end):
        mean = train_outputs[0][:, i ]
        conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
        dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
        dist = list(itertools.chain(*dist))
        dist_list[number].append(dist)

main function

procs = []
start = time.time()
for number in range(cpu_core):
    s = number * (H*W // cpu_core)
    e = (number + 1) * (H*W // cpu_core)
    proc = Process(target=calculate_distance, args=(number, s, e, train_outputs, embedding_vectors))
    procs.append(proc)
    proc.start()

for proc in procs:
    proc.join()

print("time :", time.time() - start)

global dist_list
final_list = []
for number in range(cpu_core):
    final_list.extend(dist_list[number])

final_list = np.array(final_list).transpose(1, 0).reshape(B, H, W)
final_list = torch.tensor(final_list)
score_map = F.interpolate(final_list.unsqueeze(1), size=x.size(2), mode='bilinear', align_corners=False).squeeze().numpy()

I'd appreciate it if you could give me your opinion.

@GreatScherzo
Copy link

@ingbeeedd thank you very much for sharing your code!
I haven't have time to test it out yet. But I'll sure share you the speed results after I tried it!

@ingbeeedd
Copy link

ingbeeedd commented Jun 2, 2021

@fryegg @GreatScherzo
The GPU calculated Mahalnobis distance, and it's 24 times better than before. (cpu parallel processing 3.5 times)
so, cpu parallelism has been improved by 6 times.

@fryegg
Copy link

fryegg commented Jun 3, 2021

@ingbeeedd
Nice Work! How did you calculate Mahalanobis distance with GPU? Did you change 'embedding vector' into tensor?

@ingbeeedd
Copy link

@fryegg
The code is being refreshed. I'll leave a comment as soon as it's organized.

@ingbeeedd
Copy link

ingbeeedd commented Jul 15, 2021

@GreatScherzo @fryegg @DeepKnowledge1 @okokchoi @xiahaifeng1995 @prob1995 @sangkyuleeKOR

https://github.com/ingbeeedd/PaDiM-EfficientNet I code up :)

@DeepKnowledge1
Copy link

Hi @GreatScherzo ,

thanks for your improvement, it is faster but the score is different , the scores for the normal images are higher than the defective images, do you have any explanation?

@DeepKnowledge1
Copy link

@okokchoi , Did you compute the conv_inv and save it?
see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

@GreatScherzo
Thanks for your code.

It works fine with only one image, but if you have a batch, the scores will be much different.
i think the error is in the einsum function, which i have no idea how to fix it :)

@DeepKnowledge1
Copy link

DeepKnowledge1 commented Dec 29, 2021

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images
The inference time was improved a lot

@leolv131
Copy link

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

ok, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants