reproducibilityindex.ai

Canopy Fast Sampling with Cover Trees

Authors: Manzil Zaheer, Satwik Kottur, Amr Ahmed, José Moura, Alex Smola

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images. We now present empirical studies for our fast sampling techniques in order to establish that (i) Canopy is fast (Sec. 5.1), (ii) Canopy is accurate (Sec. 5.2), and (iii) it opens new avenues for data exploration and unsupervised learning (Sec. 5.3), previously unthinkable.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, USA 2Amazon Web Services, USA 3Google Inc, USA. Correspondence to: Manzil Zaheer <manzil@cmu.edu>.
Pseudocode	No	The paper describes algorithms and procedures in text, but it does not include formally structured pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to 'publicly available pre-trained model of 200 layers2' with a GitHub link (github.com/facebook/fb.resnet.torch), but this is for a third-party model used, not the source code for the Canopy method described in the paper. There is no statement indicating that the authors' own code for Canopy is open-sourced.
Open Datasets	Yes	Datasets We use two benchmark image datasets MNIST8m (Loosli et al., 2007) and CIFAR-100 (Krizhevsky & Hinton, 2009).
Dataset Splits	No	The paper mentions using a 'TRAIN' set to learn parameters and a 'TEST' set for evaluation, but it does not explicitly specify a separate validation set or provide details about validation splits.
Hardware Specification	Yes	We run our experiments on a cluster of 16 Amazon EC2 c4.8xlarge nodes connected through 10Gb/s Ethernet. There are 36 virtual threads per node and 60GB of memory.
Software Dependencies	No	The paper states: 'All the algorithms are implemented multithreaded in simple C++11'. While C++11 provides a version, it does not list any specific libraries, frameworks, or solvers with their version numbers beyond the programming language itself, which is insufficient for full reproducibility.
Experiment Setup	Yes	All our experiments use cover tree based initializations.