The role of over-parametrization in generalization of neural networks
Authors: Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN), and show that the existing complexity measures increase with the number of hidden units hence do not explain the generalization behavior with over-parametrization. |
| Researcher Affiliation | Academia | Behnam Neyshabur School of Mathematics Institute for Advanced Study Princeton, NJ 08540 bneyshabur@gmail.com; Zhiyuan Li Department of Computer Science Princeton University Princeton, NJ 08540 zhiyuanli@princeton.edu; Srinadh Bhojanapalli Toyota Technological Institute at Chicago Chicago, IL 60637 srinadh@ttic.edu; Yann Le Cun Department of Computer Science New York University New York, NY 10012 yann@cs.nyu.edu; Nathan Srebro Toyota Technological Institute at Chicago Chicago, IL 60637 nati@ttic.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We empirically investigate the role of over-parametrization in generalization of neural networks on 3 different datasets (MNIST, CIFAR10 and SVHN)... We train two layer Re LU networks of size h on CIFAR-10 and SVHN datasets. |
| Dataset Splits | No | The paper mentions 'training and test error' and uses stopping criteria based on cross-entropy loss, but does not explicitly describe a separate validation split or dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments, only mentioning training with SGD. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | In each experiment we train using SGD with mini-batch size 64, momentum 0.9 and initial learning rate 0.1 where we reduce the learning rate to 0.01 when the cross-entropy loss reaches 0.01 and stop when the loss reaches 0.001 or if the number of epochs reaches 1000. We do not use weight decay or dropout but perform data augmentation by random horizontal flip of the image and random crop of size 28x28 followed by zero padding. |