Non-metric Similarity Graphs for Maximum Inner Product Search
Authors: Stanislav Morozov, Artem Babenko
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By an extensive comparison to the existing approaches, we show that the proposed method is a game-changer in terms of the runtime/accuracy trade-off for the MIPS problem. We introduce a new large-scale dataset for the MIPS algorithms evaluation to facilitate research in this direction. |
| Researcher Affiliation | Collaboration | Stanislav Morozov Yandex, Lomonosov Moscow State University stanis-morozov@yandex.ru Artem Babenko Yandex, National Research University Higher School of Economics artem.babenko@phystech.edu |
| Pseudocode | Yes | Algorithm 1 Greedy walk; Algorithm 2 NSW graph construction |
| Open Source Code | Yes | The dataset and the C++ implementation of our method are available online1. Footnote 1: https://github.com/stanis-morozov/ip-nsw |
| Open Datasets | Yes | Datasets. We summarize the information on the benchmark datasets in Table 1. The Netflix, Movie Lens and Yahoo!Music datasets are the established benchmarks for the MIPS problem. Music100 is a new dataset that we introduce to the community2. This dataset was obtained by IALSfactorization[26]... Footnote 2: https://github.com/stanis-morozov/ip-nsw |
| Dataset Splits | No | The paper mentions training, testing, and evaluation metrics but does not provide specific details on validation dataset splits, percentages, or methods for creating them. |
| Hardware Specification | Yes | All the experiments were performed on Intel Xeon E5-2650 machine in a single thread mode. |
| Software Dependencies | No | The paper mentions using the 'Intel MKL library' and provides a 'C++ implementation' but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The primary parameter of the NSW is the maximum vertex degree M... In our experiments we use M = 32 edges per vertex. We performed the experiments with K = 1 and K = 10. The recall values were averaged over 10, 000 randomly sampled queries. |