Gradient descent aligns the layers of deep linear networks
Authors: Ziwei Ji, Matus Telgarsky
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 1: Visualization of margin maximization and self-regularization of layers on synthetic data with a 4-layer linear network compared to a 1-layer network (a linear predictor). Figure 1a shows the convergence of 1-layer and 4-layer networks to the same margin-maximizing linear predictor on positive (blue) and negative (red) separable data. Figure 1b shows the convergence of Wi 2/ Wi F to 1 on each layer, plotted against the risk. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10. |
| Researcher Affiliation | Academia | Ziwei Ji & Matus Telgarsky Department of Computer Science University of Illinois at Urbana-Champaign {ziweiji2,mjt}@illinois.edu |
| Pseudocode | No | The paper describes mathematical proofs and derivations but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | a preliminary experiment on CIFAR-10 which establishes empirically that a form of the alignment phenomenon occurs on the standard nonlinear network Alex Net. Figure 3: Risk and alignment of dense layers (the ratio Wi 2/ Wi F ) of (nonlinear!) Alex Net on CIFAR-10. |
| Dataset Splits | No | The paper mentions using 'synthetic data' and 'CIFAR-10' but does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing). |
| Hardware Specification | No | The Acknowledgements mention 'an NVIDIA GPU grant, led to the creation of their beloved GPU machine DUTCHCRUNCH', but no specific GPU model, CPU, or other detailed hardware specifications are provided for the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch initialization' for experiments on AlexNet, but it does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Two initializations were tried: default Py Torch initialization, and a Gaussian initialization forcing all initial Frobenius norms to be just 4, which is suggested by the norm preservation property in the analysis and removes noise in the weights. |