reproducibilityindex.ai

The role of over-parametrization in generalization of neural networks

Authors: Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN), and show that the existing complexity measures increase with the number of hidden units hence do not explain the generalization behavior with over-parametrization.
Researcher Affiliation	Academia	Behnam Neyshabur School of Mathematics Institute for Advanced Study Princeton, NJ 08540 bneyshabur@gmail.com; Zhiyuan Li Department of Computer Science Princeton University Princeton, NJ 08540 zhiyuanli@princeton.edu; Srinadh Bhojanapalli Toyota Technological Institute at Chicago Chicago, IL 60637 srinadh@ttic.edu; Yann Le Cun Department of Computer Science New York University New York, NY 10012 yann@cs.nyu.edu; Nathan Srebro Toyota Technological Institute at Chicago Chicago, IL 60637 nati@ttic.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	Yes	We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN)... We train two layer Re LU networks of size h on CIFAR-10 and SVHN datasets.
Dataset Splits	No	The paper mentions 'training and test error' and uses stopping criteria based on cross-entropy loss, but does not explicitly describe a separate validation split or dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments, only mentioning training with SGD.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	In each experiment we train using SGD with mini-batch size 64, momentum 0.9 and initial learning rate 0.1 where we reduce the learning rate to 0.01 when the cross-entropy loss reaches 0.01 and stop when the loss reaches 0.001 or if the number of epochs reaches 1000. We do not use weight decay or dropout but perform data augmentation by random horizontal ﬂip of the image and random crop of size 28x28 followed by zero padding.