-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves efficiency of extend_cost with an inf cost_limit #7
Conversation
The commit gatagat/lap@cc97e2e improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of (2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices. This fix this case. (Up to x10 faster)
For instance: this is the run time vs the number of rows (the number of columns here is 20% larger). The overhead with low number of rows is from my own code that wraps yours. You can have a look at https://github.com/raphaelreme/pylapy which is a wrapper around lapx, lapjv etc.. The code I used to benchmark this behavior is here: https://github.com/raphaelreme/pylapy/blob/main/benchmark/benchmark_shape_extension.py but it's a bit a mess. |
Hello @raphaelreme , thanks for your nice contribution! According to your benchmark, your pr seems much faster. I will check and do some tests on my parts. |
- Improve efficiency by raphaelreme -> #7 - Improve speed for `extend_cost=True` - Update and improve test and benchmark - Update and improve workflows - Add mininum Python requirement -
Hello @raphaelreme , I spent some time looking at the code, and I found that
This drastically improves the speed for |
- Improve efficiency by raphaelreme -> #7 - Improve speed for `extend_cost=True` - Update and improve test and benchmark - Update and improve workflows - Add mininum Python requirement
Hum I did neglect this because I wanted to keep the same coding style as before and because it should be negligible before the assignment computations. In theory, all this stuff should be O(n^2) and the assignment is O(n^3) (to be checked) I'll try, it is possible that np.zeros is smarter than np.empty then assigning 0. But in theory, this should be roughly equivalent. In theory I would expect that the fastest solution is to initialize with np.empty then assigning only the added value to 0 and the rest with c. I'll let you know if I'm able to see some speed up on my side too. |
So I've tried, I also observe than np.zeros is much faster than np.empty and filling. I dug a bit and found this: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space:
Which explains pretty well what may happen here Nonetheless, in my benchmark, I don't see any boost in computational time when using np.zeros vs np.empty + filling. I guess, the assignment is indeed taking 99% of computational time and speeding up the initial alloc is not that useful. Still I like it better that way (with np.zeros): it can only be faster and the code is nicer. |
OK I'll keep np.zeros. Many thanks to you again @raphaelreme for this improvement. New v0.5.8 is on the way. 🎉 |
Hey,
First, thanks a lot for taking care of lap, lapx is so much more user friendly to install now!!
I had noticed in the initial repo (https://github.com/gatagat/lap) that the newest version (undeployed) had introduced some unneeded computations when using extend_cost with an inf cost_limit. I never fixed it because I use my own code fixing it, and there was no one maintaining lap and also it was not deployed on pypi (latest was 0.4.0).
This PR will fix it for lapx. I tested the code on my laptop (ubuntu20.04), it compiles well and do not change the results except the performances (10x faster with extend_cost and cost_limit = inf).
You can have a look at the commit gatagat/lap@cc97e2e which improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of (2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf
Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices. This will fix this case. (Up to x50 faster)