Quadratic models for understanding catapult dynamics of neural networks

Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks. and We provide a number of experimental results corroborating our theoretical analysis (See Section 3). and In this section, we empirically compare the test performance of three different models considered in this paper upon varying learning rate.
Researcher Affiliation Academia 1Department of Computer Science, UC San Diego 2Halicio glu Data Science Institute, UC San Diego 3Harvard & Broad Institute of MIT and Harvard
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or presented in the paper. The methodology is described through mathematical derivations and textual explanations.
Open Source Code No No statement regarding the release of open-source code for the methodology described in the paper or a link to a code repository was found.
Open Datasets Yes We implement our experiments on 3 vision datasets: CIFAR-2 (a 2-class subset of CIFAR10 (Krizhevsky et al., 2009)), MNIST (Le Cun et al., 1998), and SVHN (The Street View House Numbers) (Netzer et al., 2011), 1 speech dataset: Free Spoken Digit dataset (FSDD) (Jakobovski) and 1 text dataset: AG NEWS (Gulli).
Dataset Splits No No specific train/validation/test dataset splits (e.g., percentages, sample counts) are provided in the paper. It refers to 'training data' and 'testing set' but lacks details on the partitioning.
Hardware Specification Yes This work used NVIDIA V100 GPUs NVLINK and HDR IB (Expanse GPU) at SDSC Dell Cluster through allocation TG-CIS220009 and also, Delta system at the National Center for Supercomputing Applications through allocation bbjr-delta-gpu from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
Software Dependencies No No specific software dependencies with version numbers (e.g., libraries, frameworks, or programming languages) are provided in the paper. It only mentions general training methods like 'GD/SGD' and activation functions.
Experiment Setup Yes In all experiments, we train the models by minimizing the squared loss using standard GD/SGD with constant learning rate η. We report the best test loss achieved during the training process with each learning rate. Experimental details can be found in Appendix N.5. and For the architectures of two-layer fully connected neural network and two-layer convolutional neural network, we set the width to be 5, 000 and 1, 000 respectively. and When implementing SGD, we choose batch size to be 32.