The role of over-parametrization in generalization of neural networks

Authors: Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN), and show that the existing complexity measures increase with the number of hidden units hence do not explain the generalization behavior with over-parametrization.
Researcher Affiliation Academia Behnam Neyshabur School of Mathematics Institute for Advanced Study Princeton, NJ 08540 bneyshabur@gmail.com; Zhiyuan Li Department of Computer Science Princeton University Princeton, NJ 08540 zhiyuanli@princeton.edu; Srinadh Bhojanapalli Toyota Technological Institute at Chicago Chicago, IL 60637 srinadh@ttic.edu; Yann Le Cun Department of Computer Science New York University New York, NY 10012 yann@cs.nyu.edu; Nathan Srebro Toyota Technological Institute at Chicago Chicago, IL 60637 nati@ttic.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets Yes We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN)... We train two layer Re LU networks of size h on CIFAR-10 and SVHN datasets.
Dataset Splits No The paper mentions 'training and test error' and uses stopping criteria based on cross-entropy loss, but does not explicitly describe a separate validation split or dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments, only mentioning training with SGD.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes In each experiment we train using SGD with mini-batch size 64, momentum 0.9 and initial learning rate 0.1 where we reduce the learning rate to 0.01 when the cross-entropy loss reaches 0.01 and stop when the loss reaches 0.001 or if the number of epochs reaches 1000. We do not use weight decay or dropout but perform data augmentation by random horizontal flip of the image and random crop of size 28x28 followed by zero padding.