Training Linear Neural Networks: Non-Local Convergence and Complexity Results
Authors: Armin Eftekhari
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019; Arora et al., 2018a). |
| Researcher Affiliation | Academia | 1Department of Mathematics and Mathematical Statistics, Umea University, Sweden. AE is indebted to Holger Rauhut, Ulrich Terstiege and Gongguo Tang for insightful discussions. Correspondence to: Armin Eftekhari <armin.eftekhari@umu.se>. |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of source code for the described methodology. |
| Open Datasets | No | The paper describes a "randomly-generated whitened training dataset" for a numerical example, but it is not a publicly available dataset with concrete access information (link, citation, or repository). |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the numerical example. |
| Software Dependencies | No | The paper mentions implementing "discretization of (17) obtained from the explicit (or forward) Euler method" but does not specify any software names with version numbers. |
| Experiment Setup | Yes | Suppose that the sample size is m = 50, and consider a randomly-generated whitened training dataset... with dx = 5 and dy = 1. ...We also set W0 2 = 10 Z 2. Instead of induced flow (17), we implemented the discretization of (17) obtained from the explicit (or forward) Euler method with a step size of 10 6 with 105 steps. |