Similarity of Neural Network Representations Revisited

Authors: Simon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey Hinton

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Despite impressive empirical advances of deep neural networks in solving various tasks, the problem of understanding and characterizing the neural network representations learned from data remains relatively under-explored. We verify that wider networks learn more similar representations, and show that the similarity of early layers saturates at fewer channels than later layers. Figure 2. CKA reveals consistent relationships between layers of CNNs trained with different random initializations, whereas CCA, linear regression, and SVCCA do not.
Researcher Affiliation Industry 1Google Brain. Correspondence to: Simon Kornblith <skornblith@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that source code is provided or offer a link to a code repository.
Open Datasets Yes Each example from the CIFAR-10 test set is shown as a dot colored according to the value of the first two principal components of an intermediate layer of one network (left) and plotted on the first two principal components of the same layer of an architecturally identical network trained from a different initialization (right). CKA can also be used to compare networks trained on different datasets. In Figure 7, we show that models trained on CIFAR-10 and CIFAR-100 develop similar representations in their early layers.
Dataset Splits No The paper mentions using datasets like CIFAR-10 and CIFAR-100, and refers to a 'test set', but does not explicitly specify the training, validation, and test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We train 10 networks and, for each layer of each network, we compute the accuracy with which we can find the corresponding layer in each of the other networks by maximum similarity. We first investigate a simple VGG-like convolutional network based on All-CNN-C (Springenberg et al., 2015) (see Appendix E for architecture details). For the RBF kernel, there are several possible strategies for selecting the bandwidth σ... We set σ as a fraction of the median distance between examples.