Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

Authors: Tian Ye, Simon S. Du

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems. ... The goal of this paper is not to provide new provably efficient algorithms to solve Problem (1), but to provide a rigorous analysis of an intriguing and practically relevant phenomenon on gradient descent.
Researcher Affiliation Academia Tian Ye Institute for Interdisciplinary Information Sciences Tsinghua University yet17@mails.tsinghua.edu.cn Simon S. Du Paul G. Allen School of Computer Science and Engineering University of Washington ssdu@cs.washington.edu
Pseudocode No The paper presents equations for gradient descent updates (e.g., equations 2 and 3) but does not provide them in a formally structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any statements about releasing source code or links to a code repository for the methodology described.
Open Datasets No The paper is theoretical and does not report on experiments using datasets for training or evaluation. It references empirical observations from prior work but does not use its own data.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information for validation is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, including hardware specifications.
Software Dependencies No The paper is theoretical and does not mention any specific software or libraries with version numbers used for experiments.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details, such as hyperparameters or training configurations.