Learning Space Partitions for Nearest Neighbor Search
Authors: Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On several standard benchmarks for NNS (Aum uller et al., 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH. |
| Researcher Affiliation | Collaboration | Yihe Dong Microsoft Piotr Indyk MIT Ilya Razenshteyn Microsoft Research Tal Wagner MIT |
| Pseudocode | Yes | Algorithm 1: Nearest neighbor search with a learned space partition |
| Open Source Code | No | The paper mentions using an 'open-source solver Ka HIP' but does not provide a link or statement about the availability of their own source code for the proposed method. |
| Open Datasets | Yes | For the experimental evaluation, we use three standard ANN benchmarks (Aum uller et al., 2017): SIFT (image descriptors, 1M 128-dimensional points), Glo Ve (word embeddings (Pennington et al., 2014), approximately 1.2M 100-dimensional points, normalized), and MNIST (images of digits, 60K 784-dimensional points). |
| Dataset Splits | No | The paper mentions using the dataset P as a 'training set' and 'query points' for evaluation, but it does not explicitly describe a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | No | The paper refers to general hardware like 'GPUs' but does not provide specific models or configurations (e.g., 'NVIDIA A100', 'Intel Xeon') used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Ka HIP' but does not provide specific version numbers for them or any other software dependencies. |
| Experiment Setup | Yes | Both architectures consist of several blocks, where each block is a fully-connected layer + batch normalization (Ioffe & Szegedy, 2015) + Re LU activations. In the top-level network we use b = 3 and s = 512. In the second-level networks we use b = 2 and s = 390. To reduce overfitting, we use dropout with probability 0.1 during training. The networks are trained using the Adam optimizer (Kingma & Ba, 2015) for under 20 epochs on both levels. We reduce the learning rate multiplicatively at regular intervals. |