On the Convergence of Gradient Flow on Multi-layer Linear Models

Authors: Hancheng Min, Rene Vidal, Enrique Mallada

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4.1, we have shown a rate bound for three-layer networks under general initialization in Theorem 2. However, due to its complicated expression, it is less clear under what initialization the bound is positive. Through some numerical experiments, we show that our bound is very likely to be positive under various random initialization schemes.
Researcher Affiliation Academia 1 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, U.S.A. 2 Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A. 3 Department of Radiology, University of Pennsylvania, Philadelphia, PA, U.S.A.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access (link, explicit statement of release) to open-source code for the described methodology.
Open Datasets No The paper describes using a loss function L = Y W1W2W3 2 F /2 and various initialization schemes, but it does not specify a publicly available or open dataset with access information.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits. The numerical experiments are conducted on synthetically generated weights and a loss function, not pre-split datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup Yes Next, we run gradient descent on three-layer networks under Fanout initialization with a loss function L = Y W1W2W3 2 F /2 with step size η: 1. Middle plot: n = 1, m = 1, Y = 2, η = 5e 6; 2. Right plot: n = 5, m = 1, Y = [1, 1, 1, 1, 1]T , η = 5e 6; We consider networks with different width: (h1, h2) = (100, 200), (200, 300), (300, 500)