Convergent Learning: Do different neural networks learn the same representations?

Authors: Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We employ an architecture derived from Alex Net (Krizhevsky et al., 2012) and train multiple networks on the Image Net dataset (Deng et al., 2009) (details in Section 2). We then compare the representations learned across different networks. We trained four networks in the above manner using four different random initializations. We refer to these as Net1, Net2, Net3, and Net4. The four networks perform very similarly on the validation set, achieving top-1 accuracies of 58.65%, 58.73%, 58.79%, and 58.84%...
Researcher Affiliation Academia 1Cornell University 2University of Wyoming 3Columbia University {yli,yosinski,jeh}@cs.cornell.edu jeffclune@uwyo.edu, hod.lipson@columbia.edu
Pseudocode No The paper describes algorithmic steps for methods like Hierarchical Agglomerative Clustering (HAC) in a numbered list format, but it does not provide formal pseudocode blocks or labeled algorithm sections.
Open Source Code Yes Further details and the complete code necessary to reproduce these experiments is available at https://github.com/yixuanli/convergent_learning.
Open Datasets Yes Networks are trained using Caffe on the Image Net Large Scale Visual Recognition Challenge (ILSVRC) 2012 dataset (Deng et al., 2009)
Dataset Splits Yes The four networks perform very similarly on the validation set, achieving top-1 accuracies of 58.65%, 58.73%, 58.79%, and 58.84%, which are similar to the top-1 performance of 59.3% reported in the original study (Krizhevsky et al., 2012).
Hardware Specification No The paper mentions that the original AlexNet architecture was 'to enable splitting the model across two GPUs' but does not specify the hardware used for the experiments conducted in this paper.
Software Dependencies No The paper mentions using 'Caffe' for training and 'Scikit-learn' for Hierarchical Agglomerative Clustering but does not provide specific version numbers for these software components.
Experiment Setup Yes All networks in this study follow the basic architecture laid out by Krizhevsky et al. (2012), with parameters learned in five convolutional layers (conv1 conv5) followed by three fully connected layers (fc6 fc8). The structure is modified slightly in two ways. First, Krizhevsky et al. (2012) employed limited connectivity... Here we remove this artificial group structure... Second, we place the local response normalization layers after the pooling layers following the defaults released with the Caffe framework... We trained four networks in the above manner using four different random initializations. The paper also provides specific L1 penalty values (decay terms) for the mapping layer training in Table 1: 'decay 0', 'decay 10^-5', 'decay 10^-4', 'decay 10^-3', 'decay 10^-2', 'decay 10^-1'.