Sensitivity and Generalization in Neural Networks: an Empirical Study

Authors: Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets.
Researcher Affiliation Industry Roman Novak, Yasaman Bahri , Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {romann, yasamanb, danabo, jpennin, jaschasd}@google.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a specific link or explicit statement about the release of source code for the methodology described.
Open Datasets Yes 2160 networks trained to 100% training accuracy on CIFAR10 (see A.5.5 for experimental details). ...on CIFAR10, FASHION_MNIST, CIFAR100 and MNIST.
Dataset Splits Yes All reported values, when applicable, were evaluated on the whole training and test sets of sizes 50000 and 10000 respectively. E.g. generalization gap is defined as the difference between train and test accuracies evaluated on the whole train and test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No All experiments were implemented in Tensorflow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). The paper mentions software tools used but does not provide specific version numbers for these dependencies.
Experiment Setup Yes A.5 EXPERIMENTAL SETUP All experiments were implemented in Tensorflow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). All networks were trained with cross-entropy loss. All networks were trained without biases. All computations were done with 32-bit precision. Learning rate decayed by a factor of 0.1 every 500 epochs. ... All inputs were normalized to have zero mean and unit variance...