Improves efficiency of extend_cost with an inf cost_limit #7

raphaelreme · 2024-04-29T11:24:02Z

Hey,

First, thanks a lot for taking care of lap, lapx is so much more user friendly to install now!!

I had noticed in the initial repo (https://github.com/gatagat/lap) that the newest version (undeployed) had introduced some unneeded computations when using extend_cost with an inf cost_limit. I never fixed it because I use my own code fixing it, and there was no one maintaining lap and also it was not deployed on pypi (latest was 0.4.0).

This PR will fix it for lapx. I tested the code on my laptop (ubuntu20.04), it compiles well and do not change the results except the performances (10x faster with extend_cost and cost_limit = inf).

You can have a look at the commit gatagat/lap@cc97e2e which improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of (2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf

Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices. This will fix this case. (Up to x50 faster)

The commit gatagat/lap@cc97e2e improved the efficiency with cost_limit < np.inf (using (n + m, n + m) matrices instead of (2 max(n, m), 2 max(n, m)) matrices) but drastically reduced performance when cost_limit == np.inf Indeed, it also uses (n+m, n+m) matrices where one could only use (max(n, m), max(n,m)) matrices. This fix this case. (Up to x10 faster)

raphaelreme · 2024-04-29T11:28:13Z

For instance: this is the run time vs the number of rows (the number of columns here is 20% larger). Ref is your current version, smallest_fill_inf is more or less what was done in the version 0.4.0 of lap, and smallest_fill_0 is what I did here in the PR.

The overhead with low number of rows is from my own code that wraps yours. You can have a look at https://github.com/raphaelreme/pylapy which is a wrapper around lapx, lapjv etc.. The code I used to benchmark this behavior is here: https://github.com/raphaelreme/pylapy/blob/main/benchmark/benchmark_shape_extension.py but it's a bit a mess.

rathaROG · 2024-04-29T14:13:02Z

Hello @raphaelreme , thanks for your nice contribution! According to your benchmark, your pr seems much faster. I will check and do some tests on my parts.

- Improve efficiency by raphaelreme -> #7 - Improve speed for `extend_cost=True` - Update and improve test and benchmark - Update and improve workflows - Add mininum Python requirement -

rathaROG · 2024-04-29T18:53:54Z

Hello @raphaelreme , I spent some time looking at the code, and I found that np.zeros((n, n), dtype=np.double) is a lot faster than lines #98-100

cost_c_extended = np.empty((n, n), dtype=np.double)
c[:] = 0

This drastically improves the speed for extend_cost=True. Could you test the current main branch and share the result?

- Improve efficiency by raphaelreme -> #7 - Improve speed for `extend_cost=True` - Update and improve test and benchmark - Update and improve workflows - Add mininum Python requirement

raphaelreme · 2024-04-29T19:11:37Z

Hum I did neglect this because I wanted to keep the same coding style as before and because it should be negligible before the assignment computations. In theory, all this stuff should be O(n^2) and the assignment is O(n^3) (to be checked)

I'll try, it is possible that np.zeros is smarter than np.empty then assigning 0. But in theory, this should be roughly equivalent. In theory I would expect that the fastest solution is to initialize with np.empty then assigning only the added value to 0 and the rest with c. I'll let you know if I'm able to see some speed up on my side too.

raphaelreme · 2024-04-29T20:04:01Z

So I've tried, I also observe than np.zeros is much faster than np.empty and filling.

I dug a bit and found this: https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space:

Are you using Linux? Linux has lazy allocation of memory. The underlying calls to malloc and calloc in
numpy always 'succeed'. No memory is actually allocated until the memory is first accessed.

The zeros function will use calloc which zeros any allocated memory before it is first accessed. Therfore,
numpy need not explicitly zero the array and so the array will be lazily initialised. Whereas, the repeat
function cannot rely on calloc to initialise the array. Instead it must use malloc and then copy the repeated
to all elements in the array (thus forcing immediate allocation).

Which explains pretty well what may happen here

Nonetheless, in my benchmark, I don't see any boost in computational time when using np.zeros vs np.empty + filling. I guess, the assignment is indeed taking 99% of computational time and speeding up the initial alloc is not that useful. Still I like it better that way (with np.zeros): it can only be faster and the code is nicer.

rathaROG · 2024-04-29T23:13:57Z

OK I'll keep np.zeros. Many thanks to you again @raphaelreme for this improvement. New v0.5.8 is on the way. 🎉

rathaROG added the enhancement New feature or request label Apr 29, 2024

rathaROG merged commit 34487e0 into rathaROG:main Apr 29, 2024

raphaelreme deleted the rreme/fix_cost_extend branch April 29, 2024 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improves efficiency of extend_cost with an inf cost_limit #7

Improves efficiency of extend_cost with an inf cost_limit #7

raphaelreme commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading

rathaROG commented Apr 29, 2024

rathaROG commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading

rathaROG commented Apr 29, 2024

Improves efficiency of extend_cost with an inf cost_limit #7

Improves efficiency of extend_cost with an inf cost_limit #7

Conversation

raphaelreme commented Apr 29, 2024 • edited Loading

raphaelreme commented Apr 29, 2024 • edited Loading

rathaROG commented Apr 29, 2024

rathaROG commented Apr 29, 2024 • edited Loading

raphaelreme commented Apr 29, 2024 • edited Loading

raphaelreme commented Apr 29, 2024 • edited Loading

rathaROG commented Apr 29, 2024

raphaelreme commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading

rathaROG commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading

raphaelreme commented Apr 29, 2024 •

edited

Loading