Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix StackOverflowError in HotPixelIndex #605

Merged
merged 1 commit into from
Sep 30, 2020

Conversation

FObermaier
Copy link
Contributor

@FObermaier FObermaier commented Sep 30, 2020

Add nested CoordinateShuffler class and wire it add(Coordinate[] pts) in order to get a more balanced KdTree. The insertion speed for geometries with many coordinates greatly improves!

relates to #602

Signed-off-by: Felix Obermaier felix.obermaier@netcologne.de

Add nested CoordinateShuffler class and wire it `add(Coordinate[] pts)` in order to get a more balanced KdTree. The insertion speed greatly improves!

Signed-off-by: Felix Obermaier <felix.obermaier@netcologne.de>
@dr-jts
Copy link
Contributor

dr-jts commented Sep 30, 2020

Nice fix, @FObermaier ! I was starting to think about this myself, but this is a very elegant implementation. And it provides a dramatic performance boost.

@dr-jts
Copy link
Contributor

dr-jts commented Sep 30, 2020

I wonder if it would be even better to skip shuffling if there are only a small number of points (say <= 100) ?

@dr-jts
Copy link
Contributor

dr-jts commented Sep 30, 2020

Or another idea: instead of randomly shuffling the array indices, step through the coordinate array with a stride length which is coprime to the array size. That will visit every array entry once, but in a jumbled order.

This may not make a big peformance difference, but will use less memory.

The trick is finding a stride length which is relatively prime. One way to do it is given here (and a Java implementation of GCD here)

@dr-jts
Copy link
Contributor

dr-jts commented Sep 30, 2020

Further testing indicates that:

  • Gating the shuffling to only use it for larger arrays sizes doesn't make an appreciable difference in performance (this is often the case for indexes - there's not much advantage in not using them)
  • Using a coprime stride to step through the array is slightly better, except for some cases when it is MUCH worse. So that is too risky to use

Based on this, I'll leave this approach as it is.

@FObermaier FObermaier deleted the fix/stackoverflow_in_kdtree branch October 1, 2020 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants