Skip to content

Python Quick Start

masajiro edited this page Nov 25, 2019 · 1 revision

Installation

Python binding (ngtpy) can be installed as follows.

pip3 install ngt

Index initialization and data registration

Below is an example of index initialization and data insertion.

# create-simple.py
import ngtpy

with open('./data/sift-dataset-5k.tsv', 'r', newline='\n') as fin:
    ngtpy.create('anng', 128)   # create an empty index
    index = ngtpy.Index('anng') # open the index
    for line in fin:
        object = list(map(float, line.rstrip().split('\t')))
        index.insert(object[:128]) # append objects to the index
index.build_index() # build index
index.save() # save the index

ngtpy.create() creates the specified directory anng and initialize it as an index. ngtpy.Index() opens the index. The ./data/sift-dataset-5k.tsv is an object file that includes registered objects. Each line represents each object. The file contains additional data after each object, which should be discarded before insertion, i.e., "object[:128]". ngtpy.Index.insert() appends the objects to the index. ngtpy.Index.build_index() builds entries of the index for the appended objects. ngtpy.Index.save() saves the index.

Run the following commands to execute the example above.

cd (NGT_TOP_DIR)
python3 create-simple.py

Nearest neighbor search

Below is an example of a nearest neighbor search with ngtpy.Index.search().

# search-simple.py
import ngtpy

index = ngtpy.Index('anng', read_only = True, zero_based_numbering = False) # open the index
with open('./data/sift-query-3.tsv', 'r', newline='\n') as fin:
    for i, line in enumerate(fin):
        query_object = list(map(float, line.rstrip().split('\t')))
        result = index.search(query_object, 5) # nearest neighbor search
	print('Query No.{}'.format(i + 1))
        print('Rank\tID\tDistance')
        for rank, object in enumerate(result):
            print('{}\t{}\t{:.6f}'.format(rank + 1, object[0], object[1]))
        print()

The ./data/sift-query-3.tsv consists of query objects in the same format as the registration object file. Since this file has three query objects, three sets of results are displayed. The number of resultant objects, 5, is specified as the second argument of ngtpy.Index.search().

Query No.1
Rank    ID      Distance
1       3031    239.332397
2       4079    240.002090
3       3164    244.503586
4       3718    246.763046
5       157     251.093613
    ...