Directional convergence and alignment in deep learning

Authors: Ziwei Ji, Matus Telgarsky

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we additionally provide empirical support not just close to the theory (e.g., the Alex Net), but also on non-homogeneous networks (e.g., the Dense Net).The experiments in Figures 1 and 2 are performed in as standard a way as possible to highlight that directional convergence is a reliable property; full details are in Appendix A. Briefly, Figure 1 uses synthetic data and vanilla gradient descent... Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net
Researcher Affiliation Academia Ziwei Ji Matus Telgarsky {ziweiji2,mjt}@illinois.edu University of Illinois, Urbana-Champaign
Pseudocode No The paper contains mathematical theorems and proofs but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net
Dataset Splits No The paper mentions using synthetic data and standard CIFAR, but does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No All computations were performed on standard CPUs. This is a vague statement and does not provide specific hardware details such as CPU models, number of cores, or memory specifications.
Software Dependencies No Pytorch [Paszke et al., 2019] was used for implementation. This mentions a software name but does not provide a specific version number. No other software with version numbers is listed.
Experiment Setup Yes Figure 1 uses synthetic data and vanilla gradient descent (no momentum, no weight decay, etc.) on a 10,000 node wide 2-layer squared Re LU network.Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net; SGD was used on cifar due to training set size.