reproducibilityindex.ai

Faster cover trees

Authors: Mike Izbicki, Christian Shelton

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On standard benchmark datasets, we reduce the number of distance computations by 10 50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.
Researcher Affiliation	Academia	Mike Izbicki MIZBI001@UCR.EDU Christian R. Shelton CSHELTON@CS.UCR.EDU University of California Riverside, 900 University Ave, Riverside, CA 92521
Pseudocode	Yes	Algorithm 1 Find nearest neighbor function..., Algorithm 2 Simpliﬁed cover tree insertion function..., Algorithm 3 Nearest ancestor cover tree insertion function..., Algorithm 4 Merging cover trees function...
Open Source Code	Yes	Our code can be downloaded at http://github.com/ mikeizbicki/hlearn#covertree.
Open Datasets	Yes	All MLPack benchmark datasets with at least 20 dimensions and 10000 points, arranged in descending order by runtime of all nearest neighbor search. ... The Protein Data Bank (Berman et al., 2000) contains information... In this experiment, we use the Yahoo! Flickr Creative Commons dataset.
Dataset Splits	No	The paper describes an 'all nearest neighbors experimental setup' where trees are constructed on the dataset and then each point's nearest neighbor is found. It does not specify distinct training, validation, or testing splits, nor does it mention cross-validation.
Hardware Specification	Yes	All tests were run on an Amazon Web Services c3.8x-large instance with 60 GB of RAM and 32 Intel Xeon E5-2680 CPU cores clocked at 2.80GHz.
Software Dependencies	Yes	Both of these programs were written in C++ and compiled using g++ 4.4.7 with full optimizations. Our implementation was written in Haskell and compiled with ghc 7.8.4 also with full optimizations.
Experiment Setup	No	The paper describes the 'all nearest neighbors experimental setup' which details the task performed (tree construction then querying each point for its nearest neighbor). However, it does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings, which are typically associated with training machine learning models.