Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves efficiency of extend_cost with an inf cost_limit #7

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

raphaelreme
Copy link
Contributor

@raphaelreme raphaelreme commented Apr 29, 2024

Hey,

First, thanks a lot for taking care of lap, lapx is so much more user friendly to install now!!

I had noticed in the initial repo (https://github.com/gatagat/lap) that the newest version (undeployed) had introduced some unneeded computations when using extend_cost with an inf cost_limit. I never fixed it because I use my own code fixing it, and there was no one maintaining lap and also it was not deployed on pypi (latest was 0.4.0).

This PR will fix it for lapx. I tested the code on my laptop (ubuntu20.04), it compiles well and do not change the results except the performances (10x faster with extend_cost and cost_limit = inf).

You can have a look at the commit gatagat/lap@cc97e2e which improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of (2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf

Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices. This will fix this case. (Up to x50 faster)

The commit gatagat/lap@cc97e2e
improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of
(2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf

Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices.
This fix this case. (Up to x10 faster)
@raphaelreme
Copy link
Contributor Author

raphaelreme commented Apr 29, 2024

image

For instance: this is the run time vs the number of rows (the number of columns here is 20% larger). Ref is your current version, smallest_fill_inf is more or less what was done in the version 0.4.0 of lap, and smallest_fill_0 is what I did here in the PR.

The overhead with low number of rows is from my own code that wraps yours. You can have a look at https://github.com/raphaelreme/pylapy which is a wrapper around lapx, lapjv etc.. The code I used to benchmark this behavior is here: https://github.com/raphaelreme/pylapy/blob/main/benchmark/benchmark_shape_extension.py but it's a bit a mess.

@rathaROG rathaROG added the enhancement New feature or request label Apr 29, 2024
@rathaROG
Copy link
Owner

Hello @raphaelreme , thanks for your nice contribution! According to your benchmark, your pr seems much faster. I will check and do some tests on my parts.

@rathaROG rathaROG merged commit 34487e0 into rathaROG:main Apr 29, 2024
rathaROG added a commit that referenced this pull request Apr 29, 2024
- Improve efficiency by raphaelreme -> #7
- Improve speed for `extend_cost=True`
- Update and improve test and benchmark
- Update and improve workflows
- Add mininum Python requirement
-
@rathaROG
Copy link
Owner

rathaROG commented Apr 29, 2024

Hello @raphaelreme , I spent some time looking at the code, and I found that np.zeros((n, n), dtype=np.double) is a lot faster than lines #98-100

cost_c_extended = np.empty((n, n), dtype=np.double)
c[:] = 0

image

This drastically improves the speed for extend_cost=True. Could you test the current main branch and share the result?

rathaROG added a commit that referenced this pull request Apr 29, 2024
- Improve efficiency by raphaelreme -> #7
- Improve speed for `extend_cost=True`
- Update and improve test and benchmark
- Update and improve workflows
- Add mininum Python requirement
@raphaelreme
Copy link
Contributor Author

raphaelreme commented Apr 29, 2024

Hum I did neglect this because I wanted to keep the same coding style as before and because it should be negligible before the assignment computations. In theory, all this stuff should be O(n^2) and the assignment is O(n^3) (to be checked)

I'll try, it is possible that np.zeros is smarter than np.empty then assigning 0. But in theory, this should be roughly equivalent. In theory I would expect that the fastest solution is to initialize with np.empty then assigning only the added value to 0 and the rest with c. I'll let you know if I'm able to see some speed up on my side too.

@raphaelreme raphaelreme deleted the rreme/fix_cost_extend branch April 29, 2024 19:15
@raphaelreme
Copy link
Contributor Author

raphaelreme commented Apr 29, 2024

So I've tried, I also observe than np.zeros is much faster than np.empty and filling.

I dug a bit and found this: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space:

Are you using Linux? Linux has lazy allocation of memory. The underlying calls to malloc and calloc in
numpy always 'succeed'. No memory is actually allocated until the memory is first accessed.

The zeros function will use calloc which zeros any allocated memory before it is first accessed. Therfore,
numpy need not explicitly zero the array and so the array will be lazily initialised. Whereas, the repeat
function cannot rely on calloc to initialise the array. Instead it must use malloc and then copy the repeated
to all elements in the array (thus forcing immediate allocation).

Which explains pretty well what may happen here

Nonetheless, in my benchmark, I don't see any boost in computational time when using np.zeros vs np.empty + filling. I guess, the assignment is indeed taking 99% of computational time and speeding up the initial alloc is not that useful. Still I like it better that way (with np.zeros): it can only be faster and the code is nicer.

@rathaROG
Copy link
Owner

OK I'll keep np.zeros. Many thanks to you again @raphaelreme for this improvement. New v0.5.8 is on the way. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants