Inference time #8

sangkyuleeKOR · 2021-02-02T13:50:21Z

Thanks for your effort!
I have a question about PaDiM.
I saw the average inference time with R18-Rd100 took 0.23sec in the paper.
But in test phase, calculating train/test image vector's mahalanobis distance takes about 9sec when I use gpu.
Any comments??? Thanks!

xiahaifeng1995 · 2021-02-03T07:43:02Z

Sorry, the implementation of Mahalanobis distance is not elegant and takes up most of the inference time, which may still have room for optimization.

sangkyuleeKOR · 2021-02-04T01:44:32Z

thanks for reply! I think this way is faster that Instead of calcuating vectors with for loop, get mahalanobis distance with matrix multiply!

DeepKnowledge1 · 2021-02-15T09:20:58Z

Do you think that could be improved by multiprocessing or joblib packages?

okokchoi · 2021-03-25T08:07:51Z

Do you mean

        for i in range(H * W):
        mean = train_outputs[0][:, i]
        conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
        dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
        dist_list.append(dist)

This part takes a lot of time, right?

DeepKnowledge1 · 2021-03-25T08:32:40Z

@xiahaifeng1995 , @okokchoi , you could also move the following into the training and save it with mean.

conv_inv = np.linalg.inv(train_outputs[1][:, :, i])

So, in the training part:

train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

I replace the following :
dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]

with :
import scipy.spatial.distance as SSD

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)

okokchoi · 2021-03-25T10:59:31Z

Thanks a lot for your reply!
I'm really sorry but, I think something wrong with the code which I modificate

            for i in range(H * W):
                mean = train_outputs[0][:, i]
                conv_inv = train_outputs[1][:, :, i]
                dist = cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
                dist_list.append(dist)

<Error>
Traceback (most recent call last):
  File "main_test.py", line 301, in <module>
    main()
  File "main_test.py", line 170, in main
    dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
ValueError: axes don't match array

dist value has the same length, but something wrong with dist_list

DeepKnowledge1 · 2021-03-25T11:15:23Z

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

okokchoi · 2021-03-25T11:52:01Z

I solve the problem that I just load pkl file for the non-modified version.
I have a question @DeepKnowledge1, is the modified version faster than the original one?
(Anyway, Thank you for your favor :) You are the best!

DeepKnowledge1 · 2021-03-25T12:09:04Z

I think so, please try it and share your findings

okokchoi · 2021-03-25T12:20:12Z

Ok I will 👍

ingbeeedd · 2021-05-26T08:01:50Z

@DeepKnowledge1 @okokchoi

I think it's pretty much the same. As well as the size of the feature map, below codes are heavy

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))

Is there a way to turn it in parallel?

ingbeeedd · 2021-05-27T06:10:01Z

Improved 3.5 times through real process multiprocessing

fryegg · 2021-05-27T11:55:06Z

Improved 3.5 times through real process multiprocessing

Awesome! Did you use the multiprocessing module in Pytorch?

GreatScherzo · 2021-05-28T00:23:36Z

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ).
The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented?
Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

fryegg · 2021-05-28T00:52:29Z

@GreatScherzo That's what I want to do to change the loop to matrix calculation. I will apply some modifications to this.

ingbeeedd · 2021-05-28T03:49:55Z

@fryegg @GreatScherzo
I have written as follows.

manager = multiprocessing.Manager()
cpu_core = 8
dist_list = manager.list()
for number in range(cpu_core):
    dist_list.append(manager.list())

def calculate_distance(number, start, end, train_outputs, embedding_vectors):
    global dist_list
    for i in range(start, end):
        mean = train_outputs[0][:, i ]
        conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
        dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
        dist = list(itertools.chain(*dist))
        dist_list[number].append(dist)

main function

procs = []
start = time.time()
for number in range(cpu_core):
    s = number * (H*W // cpu_core)
    e = (number + 1) * (H*W // cpu_core)
    proc = Process(target=calculate_distance, args=(number, s, e, train_outputs, embedding_vectors))
    procs.append(proc)
    proc.start()

for proc in procs:
    proc.join()

print("time :", time.time() - start)

global dist_list
final_list = []
for number in range(cpu_core):
    final_list.extend(dist_list[number])

final_list = np.array(final_list).transpose(1, 0).reshape(B, H, W)
final_list = torch.tensor(final_list)
score_map = F.interpolate(final_list.unsqueeze(1), size=x.size(2), mode='bilinear', align_corners=False).squeeze().numpy()

I'd appreciate it if you could give me your opinion.

GreatScherzo · 2021-05-31T03:53:14Z

@ingbeeedd thank you very much for sharing your code!
I haven't have time to test it out yet. But I'll sure share you the speed results after I tried it!

ingbeeedd · 2021-06-02T04:27:16Z

@fryegg @GreatScherzo
The GPU calculated Mahalnobis distance, and it's 24 times better than before. (cpu parallel processing 3.5 times)
so, cpu parallelism has been improved by 6 times.

fryegg · 2021-06-03T00:16:56Z

@ingbeeedd
Nice Work! How did you calculate Mahalanobis distance with GPU? Did you change 'embedding vector' into tensor?

ingbeeedd · 2021-06-09T02:33:45Z

@fryegg
The code is being refreshed. I'll leave a comment as soon as it's organized.

ingbeeedd · 2021-07-15T14:34:10Z

@GreatScherzo @fryegg @DeepKnowledge1 @okokchoi @xiahaifeng1995 @prob1995 @sangkyuleeKOR

https://github.com/ingbeeedd/PaDiM-EfficientNet I code up :)

DeepKnowledge1 · 2021-11-20T08:49:15Z

Hi @GreatScherzo ,

thanks for your improvement, it is faster but the score is different , the scores for the normal images are higher than the defective images, do you have any explanation?

DeepKnowledge1 · 2021-11-23T09:18:35Z

@okokchoi , Did you compute the conv_inv and save it?
see, in the training part, and replace it with :
for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
and in testing:
dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......
Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below
def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list
To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

@GreatScherzo
Thanks for your code.

It works fine with only one image, but if you have a batch, the scores will be much different.
i think the error is in the einsum function, which i have no idea how to fix it :)

DeepKnowledge1 · 2021-12-29T09:44:05Z

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images
The inference time was improved a lot

leolv131 · 2021-12-31T01:42:05Z

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

ok, thanks

DeepKnowledge1 mentioned this issue Mar 25, 2021

Cuda is slower than cpu #17

Closed

DeepKnowledge1 mentioned this issue Apr 19, 2021

Cut the model to save time #13

Open

x12901 mentioned this issue Aug 3, 2021

unknown mistake YoungjaeDev/PaDiM-EfficientNet#1

Closed

DeepKnowledge1 mentioned this issue Dec 30, 2021

when i test，the time is so long #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference time #8

Inference time #8

sangkyuleeKOR commented Feb 2, 2021

xiahaifeng1995 commented Feb 3, 2021

sangkyuleeKOR commented Feb 4, 2021

DeepKnowledge1 commented Feb 15, 2021

okokchoi commented Mar 25, 2021 •

edited

Loading

DeepKnowledge1 commented Mar 25, 2021 •

edited

Loading

okokchoi commented Mar 25, 2021

DeepKnowledge1 commented Mar 25, 2021

okokchoi commented Mar 25, 2021 •

edited

Loading

DeepKnowledge1 commented Mar 25, 2021

okokchoi commented Mar 25, 2021

ingbeeedd commented May 26, 2021

ingbeeedd commented May 27, 2021

fryegg commented May 27, 2021

GreatScherzo commented May 28, 2021

fryegg commented May 28, 2021

ingbeeedd commented May 28, 2021 •

edited

Loading

GreatScherzo commented May 31, 2021

ingbeeedd commented Jun 2, 2021 •

edited

Loading

fryegg commented Jun 3, 2021

ingbeeedd commented Jun 9, 2021

ingbeeedd commented Jul 15, 2021 •

edited

Loading

DeepKnowledge1 commented Nov 20, 2021

DeepKnowledge1 commented Nov 23, 2021

DeepKnowledge1 commented Dec 29, 2021 •

edited

Loading

leolv131 commented Dec 31, 2021

Inference time #8

Inference time #8

Comments

sangkyuleeKOR commented Feb 2, 2021

xiahaifeng1995 commented Feb 3, 2021

sangkyuleeKOR commented Feb 4, 2021

DeepKnowledge1 commented Feb 15, 2021

okokchoi commented Mar 25, 2021 • edited Loading

DeepKnowledge1 commented Mar 25, 2021 • edited Loading

okokchoi commented Mar 25, 2021

DeepKnowledge1 commented Mar 25, 2021

okokchoi commented Mar 25, 2021 • edited Loading

DeepKnowledge1 commented Mar 25, 2021

okokchoi commented Mar 25, 2021

ingbeeedd commented May 26, 2021

ingbeeedd commented May 27, 2021

fryegg commented May 27, 2021

GreatScherzo commented May 28, 2021

fryegg commented May 28, 2021

ingbeeedd commented May 28, 2021 • edited Loading

GreatScherzo commented May 31, 2021

ingbeeedd commented Jun 2, 2021 • edited Loading

fryegg commented Jun 3, 2021

ingbeeedd commented Jun 9, 2021

ingbeeedd commented Jul 15, 2021 • edited Loading

DeepKnowledge1 commented Nov 20, 2021

DeepKnowledge1 commented Nov 23, 2021

DeepKnowledge1 commented Dec 29, 2021 • edited Loading

leolv131 commented Dec 31, 2021

okokchoi commented Mar 25, 2021 •

edited

Loading

DeepKnowledge1 commented Mar 25, 2021 •

edited

Loading

okokchoi commented Mar 25, 2021 •

edited

Loading

ingbeeedd commented May 28, 2021 •

edited

Loading

ingbeeedd commented Jun 2, 2021 •

edited

Loading

ingbeeedd commented Jul 15, 2021 •

edited

Loading

DeepKnowledge1 commented Dec 29, 2021 •

edited

Loading