On the importance of single directions for generalization
Authors: Ari S. Morcos, David G.T. Barrett, Neil C. Rabinowitz, Matthew Botvinick
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we connect these lines of inquiry to demonstrate that a network s reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. We analyzed three models: a 2-hidden layer MLP trained on MNIST, an 11-layer convolutional network trained on CIFAR-10, and a 50-layer residual network trained on Image Net. |
| Researcher Affiliation | Industry | Ari S. Morcos1, David G.T. Barrett, Neil C. Rabinowitz, & Matthew Botvinick Deep Mind London, UK {arimorcos,barrettdavid,ncr,botvinick}@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We analyzed three models: a 2-hidden layer MLP trained on MNIST, an 11-layer convolutional network trained on CIFAR-10, and a 50-layer residual network trained on Image Net. |
| Dataset Splits | No | The paper refers to using 'test loss' for early stopping and 'test accuracy' for hyperparameter selection, implying the use of a test set for these purposes. However, it does not explicitly define distinct train/validation/test dataset splits with percentages, counts, or a separate validation set, nor does it cite standard splits that include a dedicated validation set. |
| Hardware Specification | No | The paper mentions 'distributed training with 32 workers' for ImageNet, but it does not specify any particular hardware components such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper describes the neural network architectures (MLP, convolutional network, residual network) and some training parameters, but it does not specify any software dependencies or libraries with version numbers (e.g., TensorFlow version, PyTorch version, Python version). |
| Experiment Setup | Yes | MNIST MLPs For class selectivity, generalization, early stopping, and dropout experiments, each layer contained 128, 512, 2048 and 2048 units, respectively. All networks were trained for 640 epochs, with the exception of dropout networks which were trained for 5000 epochs. CIFAR-10 Conv Nets Convolutional networks were all trained on CIFAR-10 for 100 epochs. Layer sizes were: 64, 64, 128, 128, 128, 256, 256, 256, 512, 512, 512, with strides of 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, respectively. All kernels were 3x3. For the hyperparameter sweep used in Section 3.2, learning rate and batch size were evaluated using a grid search. Image Net Res Net 50-layer residual networks (He et al., 2015) were trained on Image Net using distributed training with 32 workers and a batch size of 32 for 200,000 steps. Blocks were structured as follows (stride, filter sizes, output channels): (1x1, 64, 64, 256) x 2, (2x2, 64, 64, 256), (1x1, 128, 128, 512) x 3, (2x2, 128, 128, 512), (1x1, 256, 256, 1024) x 5, (2x2, 256, 256, 1024), (1x1, 512, 512, 2048) x 3. |