Skip to content

Graph Types

Masajiro Iwasaki edited this page Jul 28, 2020 · 4 revisions

If the highest performance is not required, it is easy to use NGT. However, to obtain the highest performance, it is necessary to know the difference between the two graph types of NGT.

  • ANNG (Approximate k-Nearest Neighbor Graph): An undirected and connected graph. An undirected graph consists of undirected (bidirectional) edges. A connected graph is one in which there is a path between two arbitrary nodes. The numbers of outgoing and incoming edges are the same for each node. An ANNG is a basic NGT graph type.

  • ONNG (Optimized Nearest Neighbors Graph): A directed graph. A directed graph consists of directed edges. Since the edges in the graph are optimized to obtain high search performance, its search performance is higher than that of an ANNG. However, objects cannot be removed from an ONNG.

In addition to the two types of graphs, a tree-based index is also used to search.

ANNG

Below is an example of constructing an ANNG.

ngt create -d 300 -D c -E 10 anng vecs.tsv

Here, -E specifies the number of edges for each appended node. Since an ANNG is constructed by incrementally appending a node with the same number of edges to the graph, the numbers of edges for existing nodes gradually increase, as more nodes are appended. Below is the flow of ANNG construction.

graph construction

As the number of edges increases, construction time increases and search accuracy improves. Since search time increases at the same time, an appropriate number of edges should be set to obtain better performance. However, not all edges are always used to explore the graph. By default, 40 edges are used to balance search accuracy and time relatively for most datasets. -S specifies the number of edges to explore instead of the default value of 40 as follows.

ngt create -d 300 -D c -E 20 -S 50 anng

The number of edges specified with -S is a fixed value. However, it can be automatically and adequately calculated from search_range_coefficient of search when -2 is specified with -S as follows or -E is specified in the search command. The coefficients of the formula to calculate the number of edges are set by default.

ngt create -d 300 -D c -E 50 -S -2 anng

Since it is difficult to adjust these parameters and coefficients above to fit your dataset, there are optimization functions (Command/Python/Cpp) to automatically optimize these parameters.

ONNG

Next is ONNG construction. As described above, an ONNG can be converted from an ANNG by using the following command.

ngt reconstruct-graph -m S -o 10 -i 100 anng onng

This command performs the following five steps.

  1. Graph construction
    First, an initial ONNG without edges is a graph in which the only nodes are extracted from the specified ANNG. In the example above, 10 outgoing edges that are extracted for each node in the ANNG are added to the ONNG, and 100 outgoing edges extracted in the same way are reversed in direction and added to the ONNG. The following figure shows the graph construction.

graph reconstruction

  1. Shortcut reduction (pass adjustment)
    Unnecessary shortcut paths are removed to decrease edges (see this paper for details).

  2. Search coefficient optimization
    The search coefficients to calculate the number of edges to explore are optimized .

  3. Memory prefetch optimization
    Th size and offset for the memory prefetch during the search processing are optimized.

  4. Accuracy table generation
    The accuracy table is a lookup table from accuracies to epsilons. When an expected accuracy is specified for search, the indispensable epsilon value for the actual search is converted from the specified accuracy with the table.

ONNG construction and optimization are completed with the above steps.