Sensitivity and Generalization in Neural Networks: an Empirical Study
Authors: Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments survey thousands of models with various fully-connected architectures, optimizers, and other hyper-parameters, as well as four different image classification datasets. |
| Researcher Affiliation | Industry | Roman Novak, Yasaman Bahri , Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {romann, yasamanb, danabo, jpennin, jaschasd}@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of source code for the methodology described. |
| Open Datasets | Yes | 2160 networks trained to 100% training accuracy on CIFAR10 (see A.5.5 for experimental details). ...on CIFAR10, FASHION_MNIST, CIFAR100 and MNIST. |
| Dataset Splits | Yes | All reported values, when applicable, were evaluated on the whole training and test sets of sizes 50000 and 10000 respectively. E.g. generalization gap is defined as the difference between train and test accuracies evaluated on the whole train and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | All experiments were implemented in Tensorflow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). The paper mentions software tools used but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | A.5 EXPERIMENTAL SETUP All experiments were implemented in Tensorflow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017). All networks were trained with cross-entropy loss. All networks were trained without biases. All computations were done with 32-bit precision. Learning rate decayed by a factor of 0.1 every 500 epochs. ... All inputs were normalized to have zero mean and unit variance... |