reproducibilityindex.ai

On the Convergence of Gradient Flow on Multi-layer Linear Models

Authors: Hancheng Min, Rene Vidal, Enrique Mallada

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4.1, we have shown a rate bound for three-layer networks under general initialization in Theorem 2. However, due to its complicated expression, it is less clear under what initialization the bound is positive. Through some numerical experiments, we show that our bound is very likely to be positive under various random initialization schemes.
Researcher Affiliation	Academia	1 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, U.S.A. 2 Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, U.S.A. 3 Department of Radiology, University of Pennsylvania, Philadelphia, PA, U.S.A.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access (link, explicit statement of release) to open-source code for the described methodology.
Open Datasets	No	The paper describes using a loss function L = Y W1W2W3 2 F /2 and various initialization schemes, but it does not specify a publicly available or open dataset with access information.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits. The numerical experiments are conducted on synthetically generated weights and a loss function, not pre-split datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	Next, we run gradient descent on three-layer networks under Fanout initialization with a loss function L = Y W1W2W3 2 F /2 with step size η: 1. Middle plot: n = 1, m = 1, Y = 2, η = 5e 6; 2. Right plot: n = 5, m = 1, Y = [1, 1, 1, 1, 1]T , η = 5e 6; We consider networks with different width: (h1, h2) = (100, 200), (200, 300), (300, 500)