reproducibilityindex.ai

Convergent Learning: Do different neural networks learn the same representations?

Authors: Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We employ an architecture derived from Alex Net (Krizhevsky et al., 2012) and train multiple networks on the Image Net dataset (Deng et al., 2009) (details in Section 2). We then compare the representations learned across different networks. We trained four networks in the above manner using four different random initializations. We refer to these as Net1, Net2, Net3, and Net4. The four networks perform very similarly on the validation set, achieving top-1 accuracies of 58.65%, 58.73%, 58.79%, and 58.84%...
Researcher Affiliation	Academia	1Cornell University 2University of Wyoming 3Columbia University {yli,yosinski,jeh}@cs.cornell.edu jeffclune@uwyo.edu, hod.lipson@columbia.edu
Pseudocode	No	The paper describes algorithmic steps for methods like Hierarchical Agglomerative Clustering (HAC) in a numbered list format, but it does not provide formal pseudocode blocks or labeled algorithm sections.
Open Source Code	Yes	Further details and the complete code necessary to reproduce these experiments is available at https://github.com/yixuanli/convergent_learning.
Open Datasets	Yes	Networks are trained using Caffe on the Image Net Large Scale Visual Recognition Challenge (ILSVRC) 2012 dataset (Deng et al., 2009)
Dataset Splits	Yes	The four networks perform very similarly on the validation set, achieving top-1 accuracies of 58.65%, 58.73%, 58.79%, and 58.84%, which are similar to the top-1 performance of 59.3% reported in the original study (Krizhevsky et al., 2012).
Hardware Specification	No	The paper mentions that the original AlexNet architecture was 'to enable splitting the model across two GPUs' but does not specify the hardware used for the experiments conducted in this paper.
Software Dependencies	No	The paper mentions using 'Caffe' for training and 'Scikit-learn' for Hierarchical Agglomerative Clustering but does not provide specific version numbers for these software components.
Experiment Setup	Yes	All networks in this study follow the basic architecture laid out by Krizhevsky et al. (2012), with parameters learned in ﬁve convolutional layers (conv1 conv5) followed by three fully connected layers (fc6 fc8). The structure is modiﬁed slightly in two ways. First, Krizhevsky et al. (2012) employed limited connectivity... Here we remove this artiﬁcial group structure... Second, we place the local response normalization layers after the pooling layers following the defaults released with the Caffe framework... We trained four networks in the above manner using four different random initializations. The paper also provides specific L1 penalty values (decay terms) for the mapping layer training in Table 1: 'decay 0', 'decay 10^-5', 'decay 10^-4', 'decay 10^-3', 'decay 10^-2', 'decay 10^-1'.