reproducibilityindex.ai

Gradient descent aligns the layers of deep linear networks

Authors: Ziwei Ji, Matus Telgarsky

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 1: Visualization of margin maximization and self-regularization of layers on synthetic data with a 4-layer linear network compared to a 1-layer network (a linear predictor). Figure 1a shows the convergence of 1-layer and 4-layer networks to the same margin-maximizing linear predictor on positive (blue) and negative (red) separable data. Figure 1b shows the convergence of Wi 2/ Wi F to 1 on each layer, plotted against the risk. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10.
Researcher Affiliation	Academia	Ziwei Ji & Matus Telgarsky Department of Computer Science University of Illinois at Urbana-Champaign {ziweiji2,mjt}@illinois.edu
Pseudocode	No	The paper describes mathematical proofs and derivations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not include an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10.
Dataset Splits	No	The paper mentions using 'synthetic data' and 'CIFAR-10' but does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification	No	The Acknowledgements mention 'an NVIDIA GPU grant, led to the creation of their beloved GPU machine DUTCHCRUNCH', but no specific GPU model, CPU, or other detailed hardware specifications are provided for the experiments.
Software Dependencies	No	The paper mentions 'Py Torch initialization' for experiments on AlexNet, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Two initializations were tried: default Py Torch initialization, and a Gaussian initialization forcing all initial Frobenius norms to be just 4, which is suggested by the norm preservation property in the analysis and removes noise in the weights.