-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #1
Comments
Timing correctly (i.e. putting the code in a function, not counting the first run because of compilation/warm-up/..., see Julia manual) brings this down to approximately 2s per run on my machine, i.e. about 20s to run this 10 times. Still slower than python. (Is Python reporting the time per run or the time for the total number of 10 runs?). Inspecting the resulting function with However, a more sensible initialization of |
Yes sorry this was on my to do list. Just pushed some commits (0968d13) and it is much faster now for me. https://github.com/ahwillia/Einsum.jl#benchmarks If this works for you both then I'll close this issue. |
I tried your new implementation. While there is a significant speedup compared to the older version it still depends on the actual summation. I therefore created some small benchmarks scripts to compare the einsum methods. I've also included opt-einsum which is an optimized numpy einsum package.
@Jutho feel free to comment on the measured timings or send me a PR in the benchmark repo |
Wow - nice work! I didn't know about the Any ideas on why we lose out on the Looks like beating Edit: Two nice stackoverflow questions that are relevant: Q1, Q2 |
A brief update: adding Even better would be to follow opt_einsum and figure out intermediate solutions. This is outside of my bandwidth at the moment, but all input and PRs are welcome. I'll tag a new release somewhat soon after I do more testing to make sure this doesn't break anything. |
Is it possible for the package to use TensorOperations' @tensor when possible? It seems like the difference between @tensor and @Einsum is that @Einsum allows pairs of indices on the RHS to exist on the LHS. Since @tensor can be orders of magnitude faster than the loop approach for contraction (because of BLAS), is there a way to replace the inner loops with @tensor, keeping the broadcasting in the outer loops? I imagine that would greatly improve performance. For example, C = zeros(50,100)
B = randn(50,100)
A = randn(50,50,100) Then the generated code could look like: #@einsum C[i,k] = A[i,j,k] * B[j,k]
for k in 1:size(A,3)
vC = view(C,:,k)
vA = view(A,:,:,k)
vB = view(B,:,k)
@tensor vC[i] = vA[i,j] * vB[j]
end |
This would be a great addition to this package! Feel free to open a PR. At the moment, I am working on other projects and do not have time to implement this. Similarly, you could think about swapping out operations for BLAS calls directly (e.g. identifying sub-problems that are matrix multiplies). |
Unreasonable allocation comparing with @Jutho 's TensorOperations (even with pre-allocation)
Update noteAccording to the benchmark result, there is a matrix type related type instability. With enhanced type stability, the performance of |
Compared to
numpy.einsum
,Einsum.jl
seems to be quite slow. I wonder if there is some room for improvements on this side:numpy
Elapsed time: 1.44563603401s
Julia
elapsed time: 85.405141333 seconds
The text was updated successfully, but these errors were encountered: