Understanding Deflation Process in Over-parametrized Tensor Decomposition

Authors: Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components.Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors.
Researcher Affiliation Academia Rong Ge Duke University rongge@cs.duke.edu Yunwei Ren* Shanghai Jiao Tong University 2016renyunwei@sjtu.edu.cn Xiang Wang* Duke University xwang@cs.duke.edu Mo Zhou* Duke University mozhou@cs.duke.edu
Pseudocode Yes Algorithm 1 Tensor Deflation Process
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper constructs a synthetic tensor (T = P i [5] aie 4 i) for illustrative purposes in Figure 1, but does not use or provide access information for any publicly available or open datasets.
Dataset Splits No The paper is theoretical and does not conduct experiments that require specifying training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware used for experiments, such as specific GPU or CPU models.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers required to replicate the work.
Experiment Setup Yes Input: Number of components m, initialization scale δ0, re-initialization threshold δ1, increasing rate of epoch length γ, target accuracy ϵ, regularization coefficient λ" and "Theorem 1. For any ϵ exp( o(d/ log d)), there exists γ = Θ(1), m = poly(d), λ = min{O(log d/d), O(ϵ/d1/2)}), α = min{O(λ/d3/2), O(λ2), O(ϵ2/d4)}, δ1 = O(α3/2/m1/2), δ0 = Θ(δ1α/ log1/2(d)) such that with probability 1 1/poly(d) in the (re)-initializations, Algorithm 2 terminates in O(log(d/ϵ)) epochs and returns a tensor T such that