Sublinear Time Low-Rank Approximation of Distance Matrices
Authors: Ainesh Bakshi, David Woodruff
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically compare our algorithm with the SVD and input sparsity time algorithms. Our algorithm is several hundred times faster than the SVD, and about 8-20 times faster than input sparsity methods on real-world and and synthetic datasets of size 108. Accuracy-wise, our algorithm is only slightly worse than that of the SVD (optimal) and input-sparsity time algorithms. |
| Researcher Affiliation | Academia | Ainesh Bakshi Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 abakshi@cs.cmu.edu David P. Woodruff Department of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 dwoodruf@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 : Row Norm Estimation. Algorithm 2 : First Sublinear Time Algorithm. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The second dataset we use is the popular MNIST dataset which is a collection of 70000 handwritten characters but sample 10000 points from it. |
| Dataset Splits | No | The paper describes generating synthetic data and sampling points from MNIST, but does not provide specific details about train/validation/test splits. |
| Hardware Specification | Yes | The experiments are run on a Macbook Pro 2017 with 16GB RAM, 2.8GHz quad-core 7th-generation Intel Core i7 processor. |
| Software Dependencies | No | The paper mentions software packages like "numpy's linear algebra package", "scipy's sparse linear algebra package", and "scikit-learn" but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We generate 10000 points with 200 features split up into 20 clusters. ... sample 10000 points from it. ... Running Time (in seconds) on the Clustering Dataset for Rank = 20 ... Running Time (in seconds) on the MNIST Dataset for Rank = 40 |