Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
Authors: Gregory Dexter, Petros Drineas, David Woodruff, Taisuke Yasuda
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean k-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the n input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS s for the k-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and k-means clustering. |
| Researcher Affiliation | Academia | Gregory Dexter Department of Computer Science Purdue University gdexter@purdue.edu Petros Drineas Department of Computer Science Purdue University pdrineas@purdue.edu David P. Woodruff Computer Science Department Carnegie Mellon University dwoodruf@cs.cmu.edu Taisuke Yasuda Computer Science Department Carnegie Mellon University taisukey@cs.cmu.edu |
| Pseudocode | Yes | We give a full discussion of our results and techniques for our PTAS for sparse dictionary learning in Section 2. ... is formalized in Algorithm 1 in the appendix. ... Algorithm 2 will return a feasible solution to the k-means clustering problem (Definition 1.2), ( X, D), satisfying: k X D Ak F (1 + ) min D2Rk d,X2X k XD Ak F , with constant probability. Furthermore, Algorithm 2 runs in n poly(k/ ) + exp( k polylog(k/ )) time. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not mention specific datasets (e.g., MNIST, CIFAR-10) or provide links/citations to public datasets used for training. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | No | The paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text, as it is a theoretical paper without empirical experiments. |