reproducibilityindex.ai

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Authors: Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Miłoś, Tomasz Trzcinski

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. We explore the tunnel s behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. To investigate the tunnel effect, we conduct multiple experiments that support our findings and shed some light on the potential source of this behavior.
Researcher Affiliation	Collaboration	1Warsaw University of Technology, Poland 2University of Alberta, Canada 3University College London, UK 4IDEAS NCBR, Poland 5Institute of Mathematics, Polish Academy of Sciences, Poland 6Tooploox, Poland
Pseudocode	No	The paper describes experimental setups and methods in text and figures, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We use three image classification tasks to study the tunnel effect: CIFAR-10, CIFAR-100, and CINIC-10. [61] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012. [62] L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, Cinic-10 is not imagenet or cifar-10, ar Xiv preprint ar Xiv:1810.03505, 2018.
Dataset Splits	No	The paper states the training and test splits for datasets (e.g., '50,000 training images and 10,000 test images' for CIFAR-10), but does not explicitly mention a validation set or its proportion.
Hardware Specification	Yes	We conducted approximately 300 experiments to finalize our work, each taking about three wall-clock hours on a single NVIDIA A5000 GPU. We had access to a server with eight NVIDIA A5000 GPUs, enabling us to parallelize our experiments and reduce total computation time.
Software Dependencies	No	The paper lists hyperparameters used for training but does not provide specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Hyperparameters used for neural network training are presented in the leftmost Table A.1. Each column shows the values of the hyperparameters corresponding to a different architecture. The presented hyperparameters are recommended for the best performance of these models on the CIFAR-10 dataset [60]. Parameter VGG Res Net MLP Learning rate (LR) 0.1 0.1 0.05 SGD momentum 0.9 0.9 0.0 Weight decay 10 4 10 4 0 Number of epochs 160 164 1000 Mini-batch size 128 128 128 LR-decay-milestones 80, 120 82, 123 LR-decay-gamma 0.1 0.1 0.0