reproducibilityindex.ai

Near Neighbor: Who is the Fairest of Them All?

Authors: Sariel Har-Peled, Sepideh Mahabadi

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we run experiments to show performance of our approach on real data.
Researcher Affiliation	Academia	Sariel Har-Peled University of Illinois at Urbana-Champaign Champaign, IL 61801 sariel@illinois.edu Sepideh Mahabadi Toyota Technological Institute at Chicago Chicago, IL 60637 mahabadi@ttic.edu
Pseudocode	No	The paper describes algorithms in text and refers to appendices for proofs, but does not include explicit pseudocode blocks or algorithm listings in the main body.
Open Source Code	No	The paper does not provide any specific links or explicit statements about releasing source code for the described methodology.
Open Datasets	Yes	We run our experiments on three datasets that are standard benchmarks in the context of Nearest Neighbor algorithms (see [ABF17]) (I) Our ﬁrst data set contains a random subset of 10K points in the MNIST training data set [LBBH98]3. (III) Finally, we take a random subset of 10K words from the Glo Ve data set [PSM14] and a random subset of 100 words as our query.
Dataset Splits	No	The paper mentions using a "MNIST training data set" and a "MNIST test data set" but does not specify a separate validation split or explicit percentages/counts for how the data is partitioned for training, validation, and testing.
Hardware Specification	No	The paper discusses experiment setup and parameters but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using "locality sensitive hashing data structure for the L2 Euclidean distance [AI08]" and the "E2LSH library [And05]", but it does not specify version numbers for these software components.
Experiment Setup	Yes	For MNIST, the average distance of a query to its nearest neighbor in the our data set is around 4.5. Thus we choose the near neighbor radius r 5... we tune it between 3 and 5 and set its value to w 3.1. We tune k and L so that the false negative rate (the near points that are not retrieved by LSH) is less than 10%, and moreover the cost of hashing (proportional to L) balances out the cost of scanning. We thus get k 15 and L 100.