reproducibilityindex.ai

Learning to Route in Similarity Graphs

Authors: Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, Artem Babenko

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By thorough experiments, we demonstrate that the proposed learnable routing successfully diminishes the local minima problem and signiﬁcantly improves the overall search performance. We experimentally demonstrate that the proposed learnable routing substantially increases the search accuracy on three open-source datasets for the same runtime budget.
Researcher Affiliation	Collaboration	1Yandex, Russia 2Lomonosov Moscow State University, Russia 3Moscow Institute of Physics and Technology, Russia 4National Research University Higher School of Economics, Russia.
Pseudocode	Yes	Algorithm 1 Beam search
Open Source Code	Yes	The Py Torch source code of our algorithm is available online1. https://github.com/dbaranchuk/learning-to-route
Open Datasets	Yes	We evaluate the proposed approach on three publicly available datasets, which are widely used as the large-scale nearest neighbor search benchmarks: 1. SIFT100K dataset(Jégou et al., 2011) is sampled from one million of 128-dimensional SIFT descriptors. 2. DEEP100K dataset(Babenko & Lempitsky, 2016) is a random 100,000 subset of one billion of 96-dimensional CNN-produced feature vectors of the natural images from the Web. 3. Glo Ve100K dataset(Pennington et al., 2014) is a collection of 300-dimensional normalized Glo V e vector representations for Wikipedia 2014 + Gigaword 5.
Dataset Splits	Yes	SIFT100K dataset... We consider 100,000 learn vectors as train queries. The holdout 10,000 query vectors are used for evaluation. DEEP100K dataset... We sample 100,000 train queries from the learn set. For evaluation we take original 10,000 queries. Glo Ve100K dataset... We randomly split the original 400,000 word embeddings on base and learn sets, each containing 100,000 vectors. 10,000 queries are taken from the remaining vectors for evaluation.
Hardware Specification	Yes	The DNN training on a single 1080Ti GPU takes on average twelve hours for the architecture described in 3.4.
Software Dependencies	No	Information is insufficient. The paper mentions 'PyTorch source code' but does not specify the version of PyTorch or any other software dependencies with version numbers.
Experiment Setup	Yes	In all the experiments we use Adam(Kingma & Ba, 2014) algorithm for SGD training. We have also observed signiﬁcantly faster convergence from One Cycle learning rate schedule(Smith & Topin, 2017). For each dataset we construct the NSW graph on the base set with the optimal maximal vertex out-degree Max M=16 and learn the routing representations for DCS = 128, 256, 512 as described in Section 3. All ablation experiments were done on the Glo Ve100K dataset in the operating point of DCS=256, k = 32 with compression to d= D/4.