reproducibilityindex.ai

Sublinear Time Low-Rank Approximation of Distance Matrices

Authors: Ainesh Bakshi, David Woodruff

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically compare our algorithm with the SVD and input sparsity time algorithms. Our algorithm is several hundred times faster than the SVD, and about 8-20 times faster than input sparsity methods on real-world and and synthetic datasets of size 108. Accuracy-wise, our algorithm is only slightly worse than that of the SVD (optimal) and input-sparsity time algorithms.
Researcher Affiliation	Academia	Ainesh Bakshi Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 abakshi@cs.cmu.edu David P. Woodruff Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 dwoodruf@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 : Row Norm Estimation. Algorithm 2 : First Sublinear Time Algorithm.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The second dataset we use is the popular MNIST dataset which is a collection of 70000 handwritten characters but sample 10000 points from it.
Dataset Splits	No	The paper describes generating synthetic data and sampling points from MNIST, but does not provide specific details about train/validation/test splits.
Hardware Specification	Yes	The experiments are run on a Macbook Pro 2017 with 16GB RAM, 2.8GHz quad-core 7th-generation Intel Core i7 processor.
Software Dependencies	No	The paper mentions software packages like "numpy's linear algebra package", "scipy's sparse linear algebra package", and "scikit-learn" but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We generate 10000 points with 200 features split up into 20 clusters. ... sample 10000 points from it. ... Running Time (in seconds) on the Clustering Dataset for Rank = 20 ... Running Time (in seconds) on the MNIST Dataset for Rank = 40