Directional convergence and alignment in deep learning
Authors: Ziwei Ji, Matus Telgarsky
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we additionally provide empirical support not just close to the theory (e.g., the Alex Net), but also on non-homogeneous networks (e.g., the Dense Net).The experiments in Figures 1 and 2 are performed in as standard a way as possible to highlight that directional convergence is a reliable property; full details are in Appendix A. Briefly, Figure 1 uses synthetic data and vanilla gradient descent... Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net |
| Researcher Affiliation | Academia | Ziwei Ji Matus Telgarsky {ziweiji2,mjt}@illinois.edu University of Illinois, Urbana-Champaign |
| Pseudocode | No | The paper contains mathematical theorems and proofs but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net |
| Dataset Splits | No | The paper mentions using synthetic data and standard CIFAR, but does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | All computations were performed on standard CPUs. This is a vague statement and does not provide specific hardware details such as CPU models, number of cores, or memory specifications. |
| Software Dependencies | No | Pytorch [Paszke et al., 2019] was used for implementation. This mentions a software name but does not provide a specific version number. No other software with version numbers is listed. |
| Experiment Setup | Yes | Figure 1 uses synthetic data and vanilla gradient descent (no momentum, no weight decay, etc.) on a 10,000 node wide 2-layer squared Re LU network.Figure 2 uses standard cifar firstly with a modified homogeneous Alex Net and secondly with an unmodified Dense Net; SGD was used on cifar due to training set size. |