Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference time #6

Open
2020zyc opened this issue Aug 1, 2019 · 4 comments
Open

inference time #6

2020zyc opened this issue Aug 1, 2019 · 4 comments

Comments

@2020zyc
Copy link

2020zyc commented Aug 1, 2019

hi, I am in a puzzle about the inference time of the compressed model. Why is the compressed model more time consuming? Shouldn't it be faster with fewer parameters(about half of the orignal) ?

thx

@saparina
Copy link
Collaborator

saparina commented Aug 1, 2019

Hi! My explanation is that tensor decomposition methods require more mathematical operations: instead of one (highly optimized in Pytorch) matrix multiplication, we have several. I think it is possible to optimize our code of Tensor Train and Tucker methods and make it faster, but it is not obvious how to do it more efficiently.

@saareliad
Copy link

It could be implemented it in one operation with einsum, however pytorch does not fully support broadcasting for einsum. (it did worked for me in numpy though).

However, I assume that torch.einsum calls many matmul operations "behind the surface" (like it does in tensorflow) so it won't be much better.

I also thought about implementing it as a numba kernel (however found that numba does not support einsum too).

@2020zyc
Copy link
Author

2020zyc commented Aug 9, 2019

thanks @saareliad

Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?

It could be implemented it in one operation with einsum

And how to implement with einsum in one operation?

@saareliad
Copy link

saareliad commented Aug 13, 2019

thanks @saareliad

Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?

It could be implemented it in one operation with einsum

And how to implement with einsum in one operation?

Most of einsum code runs in C++ so it should be faster. I didn't check extensively.
I believe that for top-optimized code one should re-write the C++/cuda kernels.

I compared memory consumption vs using a python loop with tensordots (tt-pytorch implementation) and einsum is better.

Can't publish the full code yet because its under active research. We changed the TT implementation quite a lot from the public github repos and used 4-dimensional tensors as tt.cores.
(Note that in this repo the authors "squeezed" the cores into 2-dimensional tensors, to use simple matmuls).

something like
torch.einsum('adcbr,rdxk,kcym,mbzn->axyzn', x, *tt.cores)
does the job for d=3. Notice it depends on the shape of tt.cores.
Can implement something similar for 2-d cores.

I hope that when the research is done it will be published as part of a paper or integrated to
nlp-architect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants