Near Neighbor: Who is the Fairest of Them All?
Authors: Sariel Har-Peled, Sepideh Mahabadi
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we run experiments to show performance of our approach on real data. |
| Researcher Affiliation | Academia | Sariel Har-Peled University of Illinois at Urbana-Champaign Champaign, IL 61801 sariel@illinois.edu Sepideh Mahabadi Toyota Technological Institute at Chicago Chicago, IL 60637 mahabadi@ttic.edu |
| Pseudocode | No | The paper describes algorithms in text and refers to appendices for proofs, but does not include explicit pseudocode blocks or algorithm listings in the main body. |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about releasing source code for the described methodology. |
| Open Datasets | Yes | We run our experiments on three datasets that are standard benchmarks in the context of Nearest Neighbor algorithms (see [ABF17]) (I) Our first data set contains a random subset of 10K points in the MNIST training data set [LBBH98]3. (III) Finally, we take a random subset of 10K words from the Glo Ve data set [PSM14] and a random subset of 100 words as our query. |
| Dataset Splits | No | The paper mentions using a "MNIST training data set" and a "MNIST test data set" but does not specify a separate validation split or explicit percentages/counts for how the data is partitioned for training, validation, and testing. |
| Hardware Specification | No | The paper discusses experiment setup and parameters but does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using "locality sensitive hashing data structure for the L2 Euclidean distance [AI08]" and the "E2LSH library [And05]", but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | For MNIST, the average distance of a query to its nearest neighbor in the our data set is around 4.5. Thus we choose the near neighbor radius r 5... we tune it between 3 and 5 and set its value to w 3.1. We tune k and L so that the false negative rate (the near points that are not retrieved by LSH) is less than 10%, and moreover the cost of hashing (proportional to L) balances out the cost of scanning. We thus get k 15 and L 100. |