inference time #6

2020zyc · 2019-08-01T09:12:39Z

hi, I am in a puzzle about the inference time of the compressed model. Why is the compressed model more time consuming? Shouldn't it be faster with fewer parameters(about half of the orignal) ?

thx

saparina · 2019-08-01T13:03:18Z

Hi! My explanation is that tensor decomposition methods require more mathematical operations: instead of one (highly optimized in Pytorch) matrix multiplication, we have several. I think it is possible to optimize our code of Tensor Train and Tucker methods and make it faster, but it is not obvious how to do it more efficiently.

saareliad · 2019-08-08T16:24:58Z

It could be implemented it in one operation with einsum, however pytorch does not fully support broadcasting for einsum. (it did worked for me in numpy though).

However, I assume that torch.einsum calls many matmul operations "behind the surface" (like it does in tensorflow) so it won't be much better.

I also thought about implementing it as a numba kernel (however found that numba does not support einsum too).

2020zyc · 2019-08-09T08:23:16Z

thanks @saareliad

Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?

It could be implemented it in one operation with einsum

And how to implement with einsum in one operation?

saareliad · 2019-08-13T06:24:58Z

thanks @saareliad

Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?

It could be implemented it in one operation with einsum

And how to implement with einsum in one operation?

Most of einsum code runs in C++ so it should be faster. I didn't check extensively.
I believe that for top-optimized code one should re-write the C++/cuda kernels.

I compared memory consumption vs using a python loop with tensordots (tt-pytorch implementation) and einsum is better.

Can't publish the full code yet because its under active research. We changed the TT implementation quite a lot from the public github repos and used 4-dimensional tensors as tt.cores.
(Note that in this repo the authors "squeezed" the cores into 2-dimensional tensors, to use simple matmuls).

something like
torch.einsum('adcbr,rdxk,kcym,mbzn->axyzn', x, *tt.cores)
does the job for d=3. Notice it depends on the shape of tt.cores.
Can implement something similar for 2-d cores.

I hope that when the research is done it will be published as part of a paper or integrated to
nlp-architect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference time #6

inference time #6

2020zyc commented Aug 1, 2019

saparina commented Aug 1, 2019

saareliad commented Aug 8, 2019

2020zyc commented Aug 9, 2019 •

edited

Loading

saareliad commented Aug 13, 2019 •

edited

Loading

inference time #6

inference time #6

Comments

2020zyc commented Aug 1, 2019

saparina commented Aug 1, 2019

saareliad commented Aug 8, 2019

2020zyc commented Aug 9, 2019 • edited Loading

saareliad commented Aug 13, 2019 • edited Loading

2020zyc commented Aug 9, 2019 •

edited

Loading

saareliad commented Aug 13, 2019 •

edited

Loading