Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization
Authors: Jialun Zhang, Salar Fattahi, Richard Y Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments find that Prec GD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime. ... Finally, we numerically compare Prec GD on other matrix factorization problems that fall outside of the matrix sensing framework. |
| Researcher Affiliation | Academia | Gavin Zhang University of Illinois at Urbana Champaign jialun2@illinois.edu Salar Fattahi University of Michigan fattahi@umich.edu Richard Y. Zhang University of Illinois at Urbana Champaign ryz@illinois.edu |
| Pseudocode | No | The paper describes algorithms using mathematical equations, such as 'Xk+1 = Xk - α∇f(Xk)(XkT Xk + ηkIr)^-1', but does not provide a formal pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not provide any statement or link regarding the open-sourcing of the code for the methodology described. |
| Open Datasets | Yes | The data matrices A1, . . . , Am were taken from [13, Example 12], the ground truth M = ZZT was constructed by sampling each column of Z Rn r from the standard Gaussian, and then rescaling the last column to achieve a desired condition number. |
| Dataset Splits | No | The paper discusses problem dimensions (n, r) and initial conditions but does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | For Scaled GD and Prec GD, we used a modified version of the Polyak step-size where αk = ∇f(Xk)p/∇f(Xk) P . For GD we use a decaying stepsize. ... using the same learning rate α = 2 * 10^-2. |