Faster cover trees

Authors: Mike Izbicki, Christian Shelton

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On standard benchmark datasets, we reduce the number of distance computations by 10 50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.
Researcher Affiliation Academia Mike Izbicki MIZBI001@UCR.EDU Christian R. Shelton CSHELTON@CS.UCR.EDU University of California Riverside, 900 University Ave, Riverside, CA 92521
Pseudocode Yes Algorithm 1 Find nearest neighbor function..., Algorithm 2 Simplified cover tree insertion function..., Algorithm 3 Nearest ancestor cover tree insertion function..., Algorithm 4 Merging cover trees function...
Open Source Code Yes Our code can be downloaded at http://github.com/ mikeizbicki/hlearn#covertree.
Open Datasets Yes All MLPack benchmark datasets with at least 20 dimensions and 10000 points, arranged in descending order by runtime of all nearest neighbor search. ... The Protein Data Bank (Berman et al., 2000) contains information... In this experiment, we use the Yahoo! Flickr Creative Commons dataset.
Dataset Splits No The paper describes an 'all nearest neighbors experimental setup' where trees are constructed on the dataset and then each point's nearest neighbor is found. It does not specify distinct training, validation, or testing splits, nor does it mention cross-validation.
Hardware Specification Yes All tests were run on an Amazon Web Services c3.8x-large instance with 60 GB of RAM and 32 Intel Xeon E5-2680 CPU cores clocked at 2.80GHz.
Software Dependencies Yes Both of these programs were written in C++ and compiled using g++ 4.4.7 with full optimizations. Our implementation was written in Haskell and compiled with ghc 7.8.4 also with full optimizations.
Experiment Setup No The paper describes the 'all nearest neighbors experimental setup' which details the task performed (tree construction then querying each point for its nearest neighbor). However, it does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings, which are typically associated with training machine learning models.