reproducibilityindex.ai

On the importance of single directions for generalization

Authors: Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we connect these lines of inquiry to demonstrate that a network s reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodiﬁed labels, across different hyperparameters, and over the course of training. We analyzed three models: a 2-hidden layer MLP trained on MNIST, an 11-layer convolutional network trained on CIFAR-10, and a 50-layer residual network trained on Image Net.
Researcher Affiliation	Industry	Ari S. Morcos1, David G.T. Barrett, Neil C. Rabinowitz, & Matthew Botvinick Deep Mind London, UK {arimorcos,barrettdavid,ncr,botvinick}@google.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We analyzed three models: a 2-hidden layer MLP trained on MNIST, an 11-layer convolutional network trained on CIFAR-10, and a 50-layer residual network trained on Image Net.
Dataset Splits	No	The paper refers to using 'test loss' for early stopping and 'test accuracy' for hyperparameter selection, implying the use of a test set for these purposes. However, it does not explicitly define distinct train/validation/test dataset splits with percentages, counts, or a separate validation set, nor does it cite standard splits that include a dedicated validation set.
Hardware Specification	No	The paper mentions 'distributed training with 32 workers' for ImageNet, but it does not specify any particular hardware components such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper describes the neural network architectures (MLP, convolutional network, residual network) and some training parameters, but it does not specify any software dependencies or libraries with version numbers (e.g., TensorFlow version, PyTorch version, Python version).
Experiment Setup	Yes	MNIST MLPs For class selectivity, generalization, early stopping, and dropout experiments, each layer contained 128, 512, 2048 and 2048 units, respectively. All networks were trained for 640 epochs, with the exception of dropout networks which were trained for 5000 epochs. CIFAR-10 Conv Nets Convolutional networks were all trained on CIFAR-10 for 100 epochs. Layer sizes were: 64, 64, 128, 128, 128, 256, 256, 256, 512, 512, 512, with strides of 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, respectively. All kernels were 3x3. For the hyperparameter sweep used in Section 3.2, learning rate and batch size were evaluated using a grid search. Image Net Res Net 50-layer residual networks (He et al., 2015) were trained on Image Net using distributed training with 32 workers and a batch size of 32 for 200,000 steps. Blocks were structured as follows (stride, ﬁlter sizes, output channels): (1x1, 64, 64, 256) x 2, (2x2, 64, 64, 256), (1x1, 128, 128, 512) x 3, (2x2, 128, 128, 512), (1x1, 256, 256, 1024) x 5, (2x2, 256, 256, 1024), (1x1, 512, 512, 2048) x 3.