reproducibilityindex.ai

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Authors: Hendrik Fichtenberger, Dennis Rohde

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide an experimental evaluation of our property tester on approximate nearest neighbor (ANN) indices computed by various ANN algorithms. We implemented our property tester in C++ and integrated it into the Python framework ANN-Benchmarks [2, 4]. To evaluate our property tester, we chose three algorithms with the best performance observed in [2]: KGraph [1] and hnsw and SW-graph from the Non-Metric Space Library [7, 28]. All of the ANN algorithms are implemented in C / C++ and build upon nearest neighbor / proximity graphs. We computed the ground truth, i.e., a k-NN graph of the input data, for the Euclidean datasets MNIST (size 60 000, dimension 960, [25]), Fashion-MNIST (size 60 000, dimension 960, [31]) and SIFT (size 1 000 000, dimension 128, [19]) to evaluate the answers of the tester.
Researcher Affiliation	Academia	Hendrik Fichtenberger TU Dortmund Dortmund, Germany hendrik.fichtenberger@tu-dortmund.de Dennis Rohde TU Dortmund Dortmund, Germany dennis.rohde@cs.tu-dortmund.de
Pseudocode	Yes	Algorithm 1: Tester for k-nearest neighborhood Data: G = (V, E), d, k, ϵ Result: accept or reject S sample 100k n ϵ vertices from V u.a.r. without replacement; T sample ln(10) k ψδ n vertices from V u.a.r. with replacement; S {v S \| d(v) 100k/ϵ}; for v S, u T do if (u = v u knn(v) u N(v)) d(v) < k then reject; end end accept;
Open Source Code	Yes	The C++ source code of the property tester that was used for the experiments is available here [17].
Open Datasets	Yes	We computed the ground truth, i.e., a k-NN graph of the input data, for the Euclidean datasets MNIST (size 60 000, dimension 960, [25]), Fashion-MNIST (size 60 000, dimension 960, [31]) and SIFT (size 1 000 000, dimension 128, [19]) to evaluate the answers of the tester.
Dataset Splits	No	Not found. The paper mentions datasets used for evaluation (MNIST, Fashion-MNIST, SIFT) and refers to a conceptual "training set" and "test set" or "query domain" in the introduction, but it does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility of the experiments.
Hardware Specification	Yes	We ran our benchmarks on identical machines with 60 GB of free RAM guaranteed and an Intel Xeon E5-2640 v4 CPU running at 2.40 GHz (capable of running 20 concurrent threads) and measured CPU time.
Software Dependencies	No	We implemented our property tester in C++ and integrated it into the Python framework ANN-Benchmarks [2, 4]. The key-idea of ANN-Benchmarks is to compare the quality of the indices built by ANN implementations, with respect to their running times and query-answer times. To evaluate our property tester, we chose three algorithms with the best performance observed in [2]: KGraph [1] and hnsw and SW-graph from the Non-Metric Space Library [7, 28].
Experiment Setup	Yes	All ANN algorithms were run ten times for each choice of parameters built into ann-benchmarks (as listed in [5]) and every dataset. Then the tester was run once for each output and for each choice from {0.001, 0.01, 0.1} {0.05, 0.5, 5} for (c1, c2) in \|S \| = c1 8k n and \|T\| = c2 k n log(10), with oracle access to the resulting ANN index. We chose to evaluate the tester for k = 10 because indices that are very close to 10-NN graphs which is the hard case for the property tester to detect can be computed by the ANN algorithms in reasonable time, and we support this decision by an additional experiment for k = 50.