reproducibilityindex.ai

Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

Authors: Gregory Dexter, Petros Drineas, David Woodruff, Taisuke Yasuda

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean k-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the n input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS s for the k-means clustering problem, which generalizes to the ﬁrst PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and k-means clustering.
Researcher Affiliation	Academia	Gregory Dexter Department of Computer Science Purdue University gdexter@purdue.edu Petros Drineas Department of Computer Science Purdue University pdrineas@purdue.edu David P. Woodruff Computer Science Department Carnegie Mellon University dwoodruf@cs.cmu.edu Taisuke Yasuda Computer Science Department Carnegie Mellon University taisukey@cs.cmu.edu
Pseudocode	Yes	We give a full discussion of our results and techniques for our PTAS for sparse dictionary learning in Section 2. ... is formalized in Algorithm 1 in the appendix. ... Algorithm 2 will return a feasible solution to the k-means clustering problem (Deﬁnition 1.2), ( X, D), satisfying: k X D Ak F (1 + ) min D2Rk d,X2X k XD Ak F , with constant probability. Furthermore, Algorithm 2 runs in n poly(k/ ) + exp( k polylog(k/ )) time.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not mention specific datasets (e.g., MNIST, CIFAR-10) or provide links/citations to public datasets used for training.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup	No	The paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text, as it is a theoretical paper without empirical experiments.