Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Authors: Armin Eftekhari

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019; Arora et al., 2018a).
Researcher Affiliation Academia 1Department of Mathematics and Mathematical Statistics, Umea University, Sweden. AE is indebted to Holger Rauhut, Ulrich Terstiege and Gongguo Tang for insightful discussions. Correspondence to: Armin Eftekhari <armin.eftekhari@umu.se>.
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No The paper does not provide any links or explicit statements about the availability of source code for the described methodology.
Open Datasets No The paper describes a "randomly-generated whitened training dataset" for a numerical example, but it is not a publicly available dataset with concrete access information (link, citation, or repository).
Dataset Splits No The paper does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the numerical example.
Software Dependencies No The paper mentions implementing "discretization of (17) obtained from the explicit (or forward) Euler method" but does not specify any software names with version numbers.
Experiment Setup Yes Suppose that the sample size is m = 50, and consider a randomly-generated whitened training dataset... with dx = 5 and dy = 1. ...We also set W0 2 = 10 Z 2. Instead of induced flow (17), we implemented the discretization of (17) obtained from the explicit (or forward) Euler method with a step size of 10 6 with 105 steps.