Beyond Lazy Training for Over-parameterized Tensor Decomposition
Authors: Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (Rd) l of rank r (where r d), can variants of gradient descent find a rank m decomposition where m > r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m = Ω(dl 1), while a variant of gradient descent can find an approximate tensor when m = O (r2.5l log d). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. |
| Researcher Affiliation | Academia | Xiang Wang Duke University xwang@cs.duke.edu Chenwei Wu Duke University cwwu@cs.duke.edu Jason D. Lee Princeton University jasonlee@princeton.edu Tengyu Ma Stanford University tengyuma@stanford.edu Rong Ge Duke University rongge@cs.duke.edu |
| Pseudocode | Yes | Algorithm 1 Variant of Gradient Descent for Tensor Decomposition |
| Open Source Code | No | The paper does not provide any statement or link indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | This paper is theoretical and does not conduct experiments on datasets, so there is no mention of publicly available datasets for training. |
| Dataset Splits | No | As this is a theoretical paper, there are no empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | As this is a theoretical paper, no empirical experiments are conducted, and therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | As this is a theoretical paper, no empirical experiments are conducted, and therefore, no specific software dependencies or versions are mentioned. |
| Experiment Setup | No | The paper describes theoretical parameters for the algorithm (e.g., K, H, δ, η), but these are part of the theoretical analysis and not concrete experimental setup details or hyperparameters for empirical reproduction. |