reproducibilityindex.ai

Subquadratic Overparameterization for Shallow Neural Networks

Authors: ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick, Armin Eftekhari, Volkan Cevher

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we provide an analytical framework that allows us to adopt standard initialization strategies, possibly avoid lazy training, and train all layers simultaneously in basic shallow neural networks while attaining a desirable subquadratic scaling on the network width. We achieve the desiderata via Polyak-Łojasiewicz condition, smoothness, and standard assumptions on data, and use tools from random matrix theory. ... In Figure 1, we observe that while SGD achieves zero training error for every ω2, as suggested by Theorem 3 applicable in the full batch setting, the generalization ability increases as the ratio ω2/ω1 grows.
Researcher Affiliation	Academia	1Laboratory for Information and Inference Systems (LIONS), EPFL 2Umea University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See https://github.com/LIONS-EPFL/Subquadratic-Overparameterization
Open Datasets	Yes	To ensure that perfect generalization is possible, we adopt the teacher-student setup, where, for the teacher network, we train a two-layer fully connected neural network, on MNIST [25]
Dataset Splits	No	The provided text mentions using MNIST but does not specify train/validation/test splits with percentages or counts.
Hardware Specification	No	The provided text does not contain specific hardware details like CPU/GPU models or memory amounts. Appendix G is referenced for setup details, which might include this information, but it is not available in the provided text.
Software Dependencies	No	The provided text does not list specific software dependencies with version numbers.
Experiment Setup	Yes	The student networks are trained for 300 epochs to ensure convergence. ... We use mean-square loss and a smooth activation function (Ge LU [18]) for the student network to match the problem setup as closely as possible. ... Speciﬁcally, we ﬁx the product of the weight initialization ω1ω2 and then proceed by varying ω2.