reproducibilityindex.ai

Sensitivity and Generalization in Neural Networks: an Empirical Study

Authors: Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classiﬁcation datasets.
Researcher Affiliation	Industry	Roman Novak, Yasaman Bahri , Daniel A. Abolaﬁa, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {romann, yasamanb, danabo, jpennin, jaschasd}@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a specific link or explicit statement about the release of source code for the methodology described.
Open Datasets	Yes	2160 networks trained to 100% training accuracy on CIFAR10 (see A.5.5 for experimental details). ...on CIFAR10, FASHION_MNIST, CIFAR100 and MNIST.
Dataset Splits	Yes	All reported values, when applicable, were evaluated on the whole training and test sets of sizes 50000 and 10000 respectively. E.g. generalization gap is deﬁned as the difference between train and test accuracies evaluated on the whole train and test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	All experiments were implemented in Tensorﬂow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). The paper mentions software tools used but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	A.5 EXPERIMENTAL SETUP All experiments were implemented in Tensorﬂow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). All networks were trained with cross-entropy loss. All networks were trained without biases. All computations were done with 32-bit precision. Learning rate decayed by a factor of 0.1 every 500 epochs. ... All inputs were normalized to have zero mean and unit variance...