reproducibilityindex.ai

Stochastic Neighbor Compression

Authors: Matt Kusner, Stephen Tyree, Kilian Weinberger, Kunal Agrawal

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	on 4 of 7 data sets it yields lower test error than k NN on the entire training set, even at compression ratios as low as 2%; ﬁnally, the SNC compression leads to impressive speed ups over k NN even when k NN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction demonstrating that it is complementary to existing state-of-the-art algorithms to speed up k NN classiﬁcation and leads to substantial further improvements.
Researcher Affiliation	Academia	Matt J. Kusner MKUSNER@WUSTL.EDU Stephen Tyree SWTYREE@WUSTL.EDU Kilian Weinberger KILIAN@WUSTL.EDU Kunal Agrawal KUNAL@WUSTL.EDU Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63130
Pseudocode	Yes	Algorithm 1 SNC in pseudo-code.
Open Source Code	Yes	Implementation. We optimize Z by minimizing (5) with conjugate gradient descent (we use a freely-available Matlab implementation 1) and provide our implementation of SNC as open source available for download at http: //tinyurl.com/msovcfu.
Open Datasets	Yes	Dataset descriptions. We evaluate SNC and other training set reduction baselines on seven classiﬁcation datasets detailed in Table 1. Yale Faces (Georghiades et al., 2001)... Isolet1... Letters1... Adult1... W8a2... MNIST3... Forest1... 1http://tinyurl.com/uci-ml-data 2http://tinyurl.com/libsvm-data 3http://tinyurl.com/mnist-data 4http://tinyurl.com/usps-data
Dataset Splits	Yes	Neither Yale Faces nor Forest have predeﬁned test sets and so we report the average and standard deviations in performance over 5 and 10 splits, respectively. ... For LSH we cross-validate over the number of tables and hash functions and select the fastest setting that has equal or less leave-one-out error compared to k NN without LSH (for larger datasets, we performed the LSH cross-validation on class-balanced subsamples of the training set: 10% subsamples of Adult, W8a and MNIST, and 5% of Forest).
Hardware Specification	Yes	All experiments were performed on an 8-core Intel L5520 CPU with 2.27GHz clock frequency.
Software Dependencies	No	The paper mentions using a 'Matlab implementation' for conjugate gradient descent, but does not provide specific version numbers for Matlab or any other software dependencies used in the experiments.
Experiment Setup	Yes	In our experiments, we initialize γ2 with cross-validation and optimize it prior to learning. We pick the initialization that yields minimal training error.