On the Convergence of Gradient Flow on Multi-layer Linear Models
Authors: Hancheng Min, Rene Vidal, Enrique Mallada
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4.1, we have shown a rate bound for three-layer networks under general initialization in Theorem 2. However, due to its complicated expression, it is less clear under what initialization the bound is positive. Through some numerical experiments, we show that our bound is very likely to be positive under various random initialization schemes. |
| Researcher Affiliation | Academia | 1 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, U.S.A. 2 Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A. 3 Department of Radiology, University of Pennsylvania, Philadelphia, PA, U.S.A. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access (link, explicit statement of release) to open-source code for the described methodology. |
| Open Datasets | No | The paper describes using a loss function L = Y W1W2W3 2 F /2 and various initialization schemes, but it does not specify a publicly available or open dataset with access information. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits. The numerical experiments are conducted on synthetically generated weights and a loss function, not pre-split datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Next, we run gradient descent on three-layer networks under Fanout initialization with a loss function L = Y W1W2W3 2 F /2 with step size η: 1. Middle plot: n = 1, m = 1, Y = 2, η = 5e 6; 2. Right plot: n = 5, m = 1, Y = [1, 1, 1, 1, 1]T , η = 5e 6; We consider networks with different width: (h1, h2) = (100, 200), (200, 300), (300, 500) |