Understanding the Dynamics of Gradient Flow in Overparameterized Linear models

Authors: Salma Tarmoun, Guilherme Franca, Benjamin D Haeffele, Rene Vidal

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Numerical Experiments: Here we provide numerical evidence to our theoretical results. First, we generate a random matrix Y with Yij N(0, 1) and set m = 5, n = 10 and k = 50. We approximate the dynamics of gradient flow for one-layer and two-layer linear models by using gradient descent with a step size η = 10 3 (smaller step sizes did not lead to a discernible change). We evaluate the reconstruction error Y X(t) F / Y F , where X(t) = U(t)V T (t), and compare the evolution of the singular values of X(t).
Researcher Affiliation Academia 1Mathematical Institute for Data Science, Johns Hopkins University, 2Department of Applied Mathematics and Statistics, Johns Hopkins University, 3Computer Science Division, University of California, Berkeley, 4Department of Biomedical Engineering, Johns Hopkins University.
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No No statement regarding the release or availability of open-source code for the described methodology was found.
Open Datasets No The paper uses generated synthetic data, not a publicly accessible dataset with concrete access information. 'First, we generate a random matrix Y with Yij N(0, 1)' and 'We generated the matrices W and X with entries drawn from N(0, 1) and Y = φ(XW ) + ϵ where ϵ 10 3N(0, I).'
Dataset Splits No The paper uses generated synthetic data and does not explicitly mention or provide details for training, validation, or test dataset splits. It only states 'We train the two networks'.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper does not mention specific software dependencies with version numbers (e.g., programming languages or libraries with their versions).
Experiment Setup Yes We approximate the dynamics of gradient flow for one-layer and two-layer linear models by using gradient descent with a step size η = 10 3 (smaller step sizes did not lead to a discernible change). We consider Gaussian initializations, i.e., U0 and V0 have entries N(0, σ2) where σ is varied to obtain different degrees of imbalance. ... We set η = 10 5, Y N(0, 1), m = 5, n = 10 and vary k. ... Initial weights are drawn from a normal distribution N(0, 10 1).