Understanding Deflation Process in Over-parametrized Tensor Decomposition
Authors: Rong Ge, Yunwei Ren, Xiang Wang, Mo Zhou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components.Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors. |
| Researcher Affiliation | Academia | Rong Ge Duke University rongge@cs.duke.edu Yunwei Ren* Shanghai Jiao Tong University 2016renyunwei@sjtu.edu.cn Xiang Wang* Duke University xwang@cs.duke.edu Mo Zhou* Duke University mozhou@cs.duke.edu |
| Pseudocode | Yes | Algorithm 1 Tensor Deflation Process |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper constructs a synthetic tensor (T = P i [5] aie 4 i) for illustrative purposes in Figure 1, but does not use or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments that require specifying training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware used for experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers required to replicate the work. |
| Experiment Setup | Yes | Input: Number of components m, initialization scale δ0, re-initialization threshold δ1, increasing rate of epoch length γ, target accuracy ϵ, regularization coefficient λ" and "Theorem 1. For any ϵ exp( o(d/ log d)), there exists γ = Θ(1), m = poly(d), λ = min{O(log d/d), O(ϵ/d1/2)}), α = min{O(λ/d3/2), O(λ2), O(ϵ2/d4)}, δ1 = O(α3/2/m1/2), δ0 = Θ(δ1α/ log1/2(d)) such that with probability 1 1/poly(d) in the (re)-initializations, Algorithm 2 terminates in O(log(d/ϵ)) epochs and returns a tensor T such that |