Skip to content

Java implementation of an approximate nearest-neighbor search via space-filling curves

License

Notifications You must be signed in to change notification settings

hairbeRt/JSpaceFillingCurve

Repository files navigation

JDataTest

This repository captures the code progress of my project "Neighbor-preserving properties of space-filling curves" in the semester course "Project group: Computational geometry" at University of Bonn.

Aim and general idea

The nearest-neighbor-search-problem (NNS) is the problem of constructing an efficient data structure that models a set of points in some metric space (usually Euclidean space in dimension 2 or in very high dimension), such that you can quickly output the nearest member of your data structure to a given input point. It is evidently a problem with many applications, so clever approaches to this problem are of great interest.

The problem is fairly easy to solve in Euclidean space in dimension 2, however if you want to guarantee a good query time (of roughly O(log n) lookups in the data structure), the need for space grows quickly with larger dimension.

The general idea of our approach is an approximation algorithm:

  1. Model a comparison operator that orders points by their first appearance in some previously chosen space-filling curve
  2. Model the dataset as a sorted list P[-] of points (sorted by this comparison operator)
  3. Find an n such that inserting x into P at position n results in P still being sorted (i.e. Binary Search on the comparison operator).
  4. Return the closest of the points {P[n-1],P[n]}.

Our goal is to implement a small model to test this algorithm with multiple space-filling curves in 2D and compare their performance. The performance could for example be the probability of finding a nearest neighbor, or the average error.

The orders implemented include in 2D: Primitive orders like alphabetical ordering or abs-argument-ordering, ordering with a z-curve, ordering with a continuous Hilbert Curve or ordering via a curve generated by the Daun-Tiling ([1]).

In higher dimensions, ordering via one space-filling curve should grow the average error indefinitely, so we use the approach by Liao et al. in [2] to improve on the error bound. We use multiple shifts of the same space-filling curve to order the same dataset multiple times in completely different orders and we can actually guarantee a bounded error while the amount of copies of the dataset grows only linearly in the dimension. Even better, in growing dimension the average error seems to converge to 0. For this algorithm, we use a regularized multi-dimensional hilbert curve described in [3].

Example graphics

For visualisation, we used a very primitive framework that just plots points and curves into a vector graphic file. Here you can find some of the things I played around with.

This graphic shows a random data set sorted via the xyCompare procedure. Here all points are sorted by their x-coordinates, with sorting by y-coordinate occuring if the x-coordinates collide.

This graphic shows a similar procedure where the points are first sorted by their absolute value and then by geometric argument.

This Shows the ordering of points in a plane with an uncontinuous space-filling curve. The quadrants are ordered in cyclic order and the points within any quadrants are ordered recursively, where the orientation of the quadrant ordering is left unchanged in the recursive step (unlike in a "real" Hilbert-curve, where the ordering of the quadrant is rotated in the recursive step). The ordering of this uncontinuous curve is visualised in third and fourth order here and here.

This Image shows an interesting or annoying (you decide) side-effect of using whole-number-coordinates for the 2D-geometry. While the ordering of the points in the smallest scale should consist of a Z-like ordering, some rows show an N-ordering. This is due to the fact that the sizes of the quadrants (1000/64) are no longer whole numbers, so will be rounded, which "kicks" points on a small scale out of their intuitive quadrant (one could say that the algorithm still runs correctly, just that the realised decomposition of quadrants only turns one quadrant into four nearly equally-sized quadrants). A possible future solution is to use double-precision arithmetic at least for the quadrant-decomposition.

This Image shows how the uncontinuous Z-ordering curve looks in eigth order.

This Shows the continuous actual Hilbert Curve in fourth order.

Future Goals

Try to answer the following interesting questions:

  1. Which space-filling curve works best for this approach?
  2. Is the result by [2] that it suffices to have at least d+1 shifted copies in dimension d to guarantee a bounded error optimal?
  3. In practice, how many shifted copies to you need to still approach an average error of 0?
  4. Is it possible to make a non-approximative algorithm out of this?
  5. Is there a curve belonging to the Daun-Tiling that performs better than a Hilbert Curve in dimension 2?
  6. I noticed that recursive space-orderings that work by recursively permuting subsets of the search space are easier to implement if their group of permutations is abelian. Are there Hilbert Curves with abelian quadrant permutation group that behave nicely for this purpose?

Literature

[1] Herman Haverkort. Recursive tilings and space-filling curves with little fragmentation

[2] Liao et al. (2001) High Dimensional Similarity Search With Space Filling Curves

[3] Arie Bos, Herman Haverkort. Hyperorthogonal well-folded Hilbert curves

About

Java implementation of an approximate nearest-neighbor search via space-filling curves

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages