Learning to Route in Similarity Graphs

Authors: Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, Artem Babenko

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By thorough experiments, we demonstrate that the proposed learnable routing successfully diminishes the local minima problem and significantly improves the overall search performance. We experimentally demonstrate that the proposed learnable routing substantially increases the search accuracy on three open-source datasets for the same runtime budget.
Researcher Affiliation Collaboration 1Yandex, Russia 2Lomonosov Moscow State University, Russia 3Moscow Institute of Physics and Technology, Russia 4National Research University Higher School of Economics, Russia.
Pseudocode Yes Algorithm 1 Beam search
Open Source Code Yes The Py Torch source code of our algorithm is available online1. https://github.com/dbaranchuk/learning-to-route
Open Datasets Yes We evaluate the proposed approach on three publicly available datasets, which are widely used as the large-scale nearest neighbor search benchmarks: 1. SIFT100K dataset(Jégou et al., 2011) is sampled from one million of 128-dimensional SIFT descriptors. 2. DEEP100K dataset(Babenko & Lempitsky, 2016) is a random 100,000 subset of one billion of 96-dimensional CNN-produced feature vectors of the natural images from the Web. 3. Glo Ve100K dataset(Pennington et al., 2014) is a collection of 300-dimensional normalized Glo V e vector representations for Wikipedia 2014 + Gigaword 5.
Dataset Splits Yes SIFT100K dataset... We consider 100,000 learn vectors as train queries. The holdout 10,000 query vectors are used for evaluation. DEEP100K dataset... We sample 100,000 train queries from the learn set. For evaluation we take original 10,000 queries. Glo Ve100K dataset... We randomly split the original 400,000 word embeddings on base and learn sets, each containing 100,000 vectors. 10,000 queries are taken from the remaining vectors for evaluation.
Hardware Specification Yes The DNN training on a single 1080Ti GPU takes on average twelve hours for the architecture described in 3.4.
Software Dependencies No Information is insufficient. The paper mentions 'PyTorch source code' but does not specify the version of PyTorch or any other software dependencies with version numbers.
Experiment Setup Yes In all the experiments we use Adam(Kingma & Ba, 2014) algorithm for SGD training. We have also observed significantly faster convergence from One Cycle learning rate schedule(Smith & Topin, 2017). For each dataset we construct the NSW graph on the base set with the optimal maximal vertex out-degree Max M=16 and learn the routing representations for DCS = 128, 256, 512 as described in Section 3. All ablation experiments were done on the Glo Ve100K dataset in the operating point of DCS=256, k = 32 with compression to d= D/4.