Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Space Partitions for Nearest Neighbor Search
Authors: Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On several standard benchmarks for NNS (Aum uller et al., 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH. |
| Researcher Affiliation | Collaboration | Yihe Dong Microsoft Piotr Indyk MIT Ilya Razenshteyn Microsoft Research Tal Wagner MIT |
| Pseudocode | Yes | Algorithm 1: Nearest neighbor search with a learned space partition |
| Open Source Code | No | The paper mentions using an 'open-source solver Ka HIP' but does not provide a link or statement about the availability of their own source code for the proposed method. |
| Open Datasets | Yes | For the experimental evaluation, we use three standard ANN benchmarks (Aum uller et al., 2017): SIFT (image descriptors, 1M 128-dimensional points), Glo Ve (word embeddings (Pennington et al., 2014), approximately 1.2M 100-dimensional points, normalized), and MNIST (images of digits, 60K 784-dimensional points). |
| Dataset Splits | No | The paper mentions using the dataset P as a 'training set' and 'query points' for evaluation, but it does not explicitly describe a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | No | The paper refers to general hardware like 'GPUs' but does not provide specific models or configurations (e.g., 'NVIDIA A100', 'Intel Xeon') used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Ka HIP' but does not provide specific version numbers for them or any other software dependencies. |
| Experiment Setup | Yes | Both architectures consist of several blocks, where each block is a fully-connected layer + batch normalization (Ioffe & Szegedy, 2015) + Re LU activations. In the top-level network we use b = 3 and s = 512. In the second-level networks we use b = 2 and s = 390. To reduce overfitting, we use dropout with probability 0.1 during training. The networks are trained using the Adam optimizer (Kingma & Ba, 2015) for under 20 epochs on both levels. We reduce the learning rate multiplicatively at regular intervals. |