reproducibilityindex.ai

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Authors: Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (Rd) l of rank r (where r d), can variants of gradient descent ﬁnd a rank m decomposition where m > r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m = Ω(dl 1), while a variant of gradient descent can ﬁnd an approximate tensor when m = O (r2.5l log d). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Researcher Affiliation	Academia	Xiang Wang Duke University xwang@cs.duke.edu Chenwei Wu Duke University cwwu@cs.duke.edu Jason D. Lee Princeton University jasonlee@princeton.edu Tengyu Ma Stanford University tengyuma@stanford.edu Rong Ge Duke University rongge@cs.duke.edu
Pseudocode	Yes	Algorithm 1 Variant of Gradient Descent for Tensor Decomposition
Open Source Code	No	The paper does not provide any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets	No	This paper is theoretical and does not conduct experiments on datasets, so there is no mention of publicly available datasets for training.
Dataset Splits	No	As this is a theoretical paper, there are no empirical experiments with data splits for training, validation, or testing.
Hardware Specification	No	As this is a theoretical paper, no empirical experiments are conducted, and therefore, no hardware specifications are mentioned.
Software Dependencies	No	As this is a theoretical paper, no empirical experiments are conducted, and therefore, no specific software dependencies or versions are mentioned.
Experiment Setup	No	The paper describes theoretical parameters for the algorithm (e.g., K, H, δ, η), but these are part of the theoretical analysis and not concrete experimental setup details or hyperparameters for empirical reproduction.