Deep learning generalizes because the parameter-function map is biased towards simple functions

Authors: Guillermo Valle-Perez, Chico Q. Camargo, Ard A. Louis

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then provide clear evidence for this strong bias in a model DNN for Boolean functions, as well as in much larger fully conected and convolutional networks trained on CIFAR10 and MNIST. and We tested the expected generalization error bounds described in the previous section in a variety of networks trained on binarized7 versions of MNIST (Le Cun et al. (1998)), fashion-MNIST (Xiao et al. (2017)), and CIFAR10 (Krizhevsky & Hinton (2009)).
Researcher Affiliation Academia Guillermo Valle Pérez University of Oxford guillermo.valle@dtc.ox.ac.uk Chico Q. Camargo University of Oxford Ard A. Louis University of Oxford ard.louis@physics.ox.ac.uk
Pseudocode No The paper describes algorithms like adv SGD and Adam in textual form but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper references third-party codebases like 'GPy' and 'the code from Garriga-Alonso et al. (2018)' but does not provide an explicit statement or link for the authors' own implementation of the described methodology.
Open Datasets Yes We tested the expected generalization error bounds described in the previous section in a variety of networks trained on binarized7 versions of MNIST (Le Cun et al. (1998)), fashion-MNIST (Xiao et al. (2017)), and CIFAR10 (Krizhevsky & Hinton (2009)).
Dataset Splits No The paper mentions training on a 'training set of size 10000' and 'early stopping when the accuracy on the whole training set reaches 100%', but does not provide specific details on training/validation/test splits or cross-validation for reproducing the data partitioning.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory specifications) used to run the experiments.
Software Dependencies No The paper mentions software components such as 'Keras settings', 'Adam', 'GPy (since 2012)', and 'Sci Py implementation', but it does not provide specific version numbers for these tools to ensure reproducibility.
Experiment Setup Yes In all experiments in Section 6 we trained with SGD with a learning rate of 0.01, and early stopping when the accuracy on the whole training set reaches 100%. For adv SGD, we also used a batch size of 10. The Gaussian process parameters were σw = 1.0, σb = 1.0 for the CNN and σw = 10.0, σb = 10.0 for the FC.