The Tunnel Effect: Building Data Representations in Deep Neural Networks
Authors: Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Miłoś, Tomasz Trzcinski
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. We explore the tunnel s behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. To investigate the tunnel effect, we conduct multiple experiments that support our findings and shed some light on the potential source of this behavior. |
| Researcher Affiliation | Collaboration | 1Warsaw University of Technology, Poland 2University of Alberta, Canada 3University College London, UK 4IDEAS NCBR, Poland 5Institute of Mathematics, Polish Academy of Sciences, Poland 6Tooploox, Poland |
| Pseudocode | No | The paper describes experimental setups and methods in text and figures, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We use three image classification tasks to study the tunnel effect: CIFAR-10, CIFAR-100, and CINIC-10. [61] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012. [62] L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, Cinic-10 is not imagenet or cifar-10, ar Xiv preprint ar Xiv:1810.03505, 2018. |
| Dataset Splits | No | The paper states the training and test splits for datasets (e.g., '50,000 training images and 10,000 test images' for CIFAR-10), but does not explicitly mention a validation set or its proportion. |
| Hardware Specification | Yes | We conducted approximately 300 experiments to finalize our work, each taking about three wall-clock hours on a single NVIDIA A5000 GPU. We had access to a server with eight NVIDIA A5000 GPUs, enabling us to parallelize our experiments and reduce total computation time. |
| Software Dependencies | No | The paper lists hyperparameters used for training but does not provide specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Hyperparameters used for neural network training are presented in the leftmost Table A.1. Each column shows the values of the hyperparameters corresponding to a different architecture. The presented hyperparameters are recommended for the best performance of these models on the CIFAR-10 dataset [60]. Parameter VGG Res Net MLP Learning rate (LR) 0.1 0.1 0.05 SGD momentum 0.9 0.9 0.0 Weight decay 10 4 10 4 0 Number of epochs 160 164 1000 Mini-batch size 128 128 128 LR-decay-milestones 80, 120 82, 123 LR-decay-gamma 0.1 0.1 0.0 |