We have slightly modified the baseline code to run single batching. Please refer to original SONG repository for further details. For the moment, we only support l2 similarity metric.
Following are known versions to work.
g++ 5.4.0, 9.3.0
CUDA 10.0, 10.1
VEC_DIM
is vector dimension of the dataset. This value depends on the dataset. PQ_SIZE
is priority queue size used in the approximate nearest neighbor search algorithm. Setting higher PQ_SIZE
increases recall, however as mentioned in README, PQ_SIZE
cannot be increased arbitrarily and it's maximum value depends on memory size.
Usage: ./generate_template.sh && ./fill_parameters.sh PQ_SIZE VEC_DIM l2
For example: ./generate_template.sh && ./fill_parameters.sh 50 128 l2
BASE_VECTOR_PATH
denotes path of vectors used to build graph. This yields bfsg.data
and bfsg.graph
, which would be used to run query.
Usage: ./build_graph.sh BASE_VECTOR_PATH NUM_VEC VEC_DIM l2
For example: ./build_graph.sh sift1m_base.fvecs 1000000 128 l2
We have added one more option as last argument of test_query.sh to run single batching.
Usage: ./test_query.sh QUERY_PATH GROUNDTRUTH_PATH NUM_VEC VEC_DIM l2 TOP_K IS_SINGLE_BATCH
For example,
Run single batching: ./test_query.sh sift1m_query.fvecs sift1m_groundtruth.ivecs 1000000 128 l2 10 1
Run whole batching: ./test_query.sh sift1m_query.fvecs sift1m_groundtruth.ivecs 1000000 128 l2 10 0