Understanding the Dynamics of Gradient Flow in Overparameterized Linear models
Authors: Salma Tarmoun, Guilherme Franca, Benjamin D Haeffele, Rene Vidal
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Numerical Experiments: Here we provide numerical evidence to our theoretical results. First, we generate a random matrix Y with Yij N(0, 1) and set m = 5, n = 10 and k = 50. We approximate the dynamics of gradient flow for one-layer and two-layer linear models by using gradient descent with a step size η = 10 3 (smaller step sizes did not lead to a discernible change). We evaluate the reconstruction error Y X(t) F / Y F , where X(t) = U(t)V T (t), and compare the evolution of the singular values of X(t). |
| Researcher Affiliation | Academia | 1Mathematical Institute for Data Science, Johns Hopkins University, 2Department of Applied Mathematics and Statistics, Johns Hopkins University, 3Computer Science Division, University of California, Berkeley, 4Department of Biomedical Engineering, Johns Hopkins University. |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | No statement regarding the release or availability of open-source code for the described methodology was found. |
| Open Datasets | No | The paper uses generated synthetic data, not a publicly accessible dataset with concrete access information. 'First, we generate a random matrix Y with Yij N(0, 1)' and 'We generated the matrices W and X with entries drawn from N(0, 1) and Y = φ(XW ) + ϵ where ϵ 10 3N(0, I).' |
| Dataset Splits | No | The paper uses generated synthetic data and does not explicitly mention or provide details for training, validation, or test dataset splits. It only states 'We train the two networks'. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper does not mention specific software dependencies with version numbers (e.g., programming languages or libraries with their versions). |
| Experiment Setup | Yes | We approximate the dynamics of gradient flow for one-layer and two-layer linear models by using gradient descent with a step size η = 10 3 (smaller step sizes did not lead to a discernible change). We consider Gaussian initializations, i.e., U0 and V0 have entries N(0, σ2) where σ is varied to obtain different degrees of imbalance. ... We set η = 10 5, Y N(0, 1), m = 5, n = 10 and vary k. ... Initial weights are drawn from a normal distribution N(0, 10 1). |