Learning to Route in Similarity Graphs
Authors: Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, Artem Babenko
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By thorough experiments, we demonstrate that the proposed learnable routing successfully diminishes the local minima problem and significantly improves the overall search performance. We experimentally demonstrate that the proposed learnable routing substantially increases the search accuracy on three open-source datasets for the same runtime budget. |
| Researcher Affiliation | Collaboration | 1Yandex, Russia 2Lomonosov Moscow State University, Russia 3Moscow Institute of Physics and Technology, Russia 4National Research University Higher School of Economics, Russia. |
| Pseudocode | Yes | Algorithm 1 Beam search |
| Open Source Code | Yes | The Py Torch source code of our algorithm is available online1. https://github.com/dbaranchuk/learning-to-route |
| Open Datasets | Yes | We evaluate the proposed approach on three publicly available datasets, which are widely used as the large-scale nearest neighbor search benchmarks: 1. SIFT100K dataset(Jégou et al., 2011) is sampled from one million of 128-dimensional SIFT descriptors. 2. DEEP100K dataset(Babenko & Lempitsky, 2016) is a random 100,000 subset of one billion of 96-dimensional CNN-produced feature vectors of the natural images from the Web. 3. Glo Ve100K dataset(Pennington et al., 2014) is a collection of 300-dimensional normalized Glo V e vector representations for Wikipedia 2014 + Gigaword 5. |
| Dataset Splits | Yes | SIFT100K dataset... We consider 100,000 learn vectors as train queries. The holdout 10,000 query vectors are used for evaluation. DEEP100K dataset... We sample 100,000 train queries from the learn set. For evaluation we take original 10,000 queries. Glo Ve100K dataset... We randomly split the original 400,000 word embeddings on base and learn sets, each containing 100,000 vectors. 10,000 queries are taken from the remaining vectors for evaluation. |
| Hardware Specification | Yes | The DNN training on a single 1080Ti GPU takes on average twelve hours for the architecture described in 3.4. |
| Software Dependencies | No | Information is insufficient. The paper mentions 'PyTorch source code' but does not specify the version of PyTorch or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In all the experiments we use Adam(Kingma & Ba, 2014) algorithm for SGD training. We have also observed significantly faster convergence from One Cycle learning rate schedule(Smith & Topin, 2017). For each dataset we construct the NSW graph on the base set with the optimal maximal vertex out-degree Max M=16 and learn the routing representations for DCS = 128, 256, 512 as described in Section 3. All ablation experiments were done on the Glo Ve100K dataset in the operating point of DCS=256, k = 32 with compression to d= D/4. |