Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts for benchmarking opt_einsum as well as manyinds benchmarks #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kshyatt
Copy link

@kshyatt kshyatt commented Sep 21, 2021

Currently we don't benchmark computations like this with many indices. I add benchmarks for these (see also the PR to OMEinsum.jl) and scripts for benchmarking the Python library opt_einsum.

@GiggleLiu
Copy link
Collaborator

GiggleLiu commented Sep 21, 2021

Thanks for the PR!
Regarding to the following contraction pattern in which OMEinsum performs bad,

julia> code = ein"abcdefghijklmnop,flnqrcipstujvgamdwxyz->bcdeghkmnopqrstuvwxyz"
abcdefghijklmnop, flnqrcipstujvgamdwxyz -> bcdeghkmnopqrstuvwxyz

julia> OMEinsum.timespace_complexity(code, uniformsize(code, 2))
(26.0, 21.0)

julia> @btime code(x, y);
  24.924 ms (111 allocations: 48.51 MiB)

This is because OMEinsum uses the permutedims + reshape + matmul to perform tensor contraction. While in this pattern, the permutedims function takes 90% of the time because time and space complexity are similar. If we switch to TensorOperations.tensorcopy, the time can be halved.

julia> using LinearAlgebra, TensorOperations

julia> LinearAlgebra.permutedims(a::Array{T,N}, perm::NTuple{N}) where {T,N} = (TensorOperations.tensorcopy(a, collect(1:ndims(a)), perm))

julia> LinearAlgebra.permutedims!(o::Array{T}, a::Array{T}, perm::Vector) where T = (TensorOperations.tensorcopy!(a, collect(1:ndims(a)), o, perm))

julia> @btime code(x, y);
  13.353 ms (398 allocations: 48.54 MiB)

I think the best way to sovle this issue is to remove permutedims completely.
However, it is not easy to write a general contraction function with BLAS performance. Wondering what is the performance gap between OMEinsum and other packages in this benchmarking case?

@GiggleLiu
Copy link
Collaborator

Great job, wondering how the julia-gpu data is generated, I can not find the corresponding script anywhere.

And unfortunately, I do not have the write access to this repo. @under-Peter Can you please give me a write access or help merge this PR.

@under-Peter
Copy link
Owner

@GiggleLiu is it enough to add you as a collaborator or do i have to give you write access separately? Thanks so much and sorry for being short on time atm.

@GiggleLiu
Copy link
Collaborator

Now I have the write access, thanks for reacting so fast, @under-Peter.

@kshyatt
Copy link
Author

kshyatt commented Sep 22, 2021

Is this good to be merged?

@GiggleLiu
Copy link
Collaborator

GiggleLiu commented Sep 22, 2021

It looks good, but I want to make sure the benchmark on GPU is correct (especially the manyindex case), it is different from the running the case on my own host. Can you please show me the script generating the result?

After that, it should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants