Faster Maximum Inner Product Search in High Dimensions

Authors: Mo Tiwari, Ryan Kang, Jaeyong Lee, Donghyun Lee, Christopher J Piech, Sebastian Thrun, Ilan Shomorony, Martin Jinye Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the scaling of Bandit MIPS and demonstrate that Bandit MIPS outperforms prior state-of-the-art MIPS algorithms in sample complexity, wall-clock time, and precision/speedup tradeoff across a variety of experimental settings.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University, Stanford, CA 2Oxford University 3University College London 4Electrical and Computer Engineering, University of Illinois at Urbana-Champaign 5Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Bandit MIPS
Open Source Code Yes All of our experimental results are reproducible via a 1-line script at github.com/Thrun Group/Bandit MIPS.
Open Datasets Yes Datasets: We empirically evaluate the performance of Bandit MIPS and Bandit MIPS-α on two synthetic and four realworld datasets... Netflix Prize dataset (Bennett et al., 2007) the Movie Lens dataset (Harper & Konstan, 2015), the Sift-1M (J egou et al., 2011) and Crypto Pairs datasets (Carsten, 2022)
Dataset Splits No The paper discusses varying hyperparameters and error probabilities (e.g., δ and ϵ) to obtain speedups and trade-offs, and mentions using low-rank approximations or NMF for missing values, but it does not specify explicit training/validation/test dataset splits (e.g., in percentages or sample counts) for reproducibility.
Hardware Specification Yes All experiments were run on a 2019 Macbook Pro with a 2.4 GHz 8-Core Intel Core i9 CPU, 64 GB 2667 MHz DDR4 RAM, and an Intel UHD Graphics 630 1536 MB graphics card.
Software Dependencies No The paper provides a GitHub link implying code availability but does not specify any software names with version numbers, such as Python versions or specific libraries like PyTorch or scikit-learn.
Experiment Setup Yes In all scaling experiments, δ and ϵ were both set to 0.001 for Bandit MIPS and Bandit MIPS-α. For the NORMAL CUSTOM and CORRELATED NORMAL CUSTOM datasets, the sub-Gaussianity parameter was set to σ = 1. For the Netflix and Movie Lens datasets, the sub-Gaussianity parameter was set to σ = 25. For the Crypto Pairs, SIFT-1M, and Simple Song datasets described in Appendix D, the sub-Gaussianity parameters were set to σ = 2.5e9, σ = 6.25e5, and σ = 25, respectively... For the precision versus speedup tradeoff experiments, the number of dimensions was fixed to d = 10,000. The various values of speedups were obtained by varying the hyperparameters of each algorithm. For NAPG-MIPS and HNSW-MIPS, for example, M was varied from 4 to 32, ef constructions was varied from 2 to 500, and ef searches was varied from 2 to 500. For Greedy-MIPS, budget varied from 2 to 999. For LSH-MIPS, the number of hash functions and hash values vary from 1 to 10. For H2ALSH, δ varies from 1 24 to 1 2, c0 varies from 1.2 to 5, and c varies from 0.9 to 2. For Bandit MIPS , Bandit MIPS-α, and Bounded ME, speedups were obtained by varying δ from 1 1010 to 0.99 and ϵ from 1 1010 to 3.