Gradient descent aligns the layers of deep linear networks

Authors: Ziwei Ji, Matus Telgarsky

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 1: Visualization of margin maximization and self-regularization of layers on synthetic data with a 4-layer linear network compared to a 1-layer network (a linear predictor). Figure 1a shows the convergence of 1-layer and 4-layer networks to the same margin-maximizing linear predictor on positive (blue) and negative (red) separable data. Figure 1b shows the convergence of Wi 2/ Wi F to 1 on each layer, plotted against the risk. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10.
Researcher Affiliation Academia Ziwei Ji & Matus Telgarsky Department of Computer Science University of Illinois at Urbana-Champaign {ziweiji2,mjt}@illinois.edu
Pseudocode No The paper describes mathematical proofs and derivations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10.
Dataset Splits No The paper mentions using 'synthetic data' and 'CIFAR-10' but does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification No The Acknowledgements mention 'an NVIDIA GPU grant, led to the creation of their beloved GPU machine DUTCHCRUNCH', but no specific GPU model, CPU, or other detailed hardware specifications are provided for the experiments.
Software Dependencies No The paper mentions 'Py Torch initialization' for experiments on AlexNet, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Two initializations were tried: default Py Torch initialization, and a Gaussian initialization forcing all initial Frobenius norms to be just 4, which is suggested by the norm preservation property in the analysis and removes noise in the weights.