reproducibilityindex.ai

When Representations Align: Universality in Representation Learning Dynamics

Authors: Loek Van Rossem, Andrew M Saxe

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the rich and lazy regime.
Researcher Affiliation	Academia	1Gatsby Computational Neuroscience Unit, University College London 2Sainsbury Wellcome Centre, University College London.
Pseudocode	No	The paper contains mathematical equations and derivations, along with figures illustrating concepts and results, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository for the described methodology.
Open Datasets	Yes	To investigate the validity of the theory in this setting, we trained a model on the MNIST dataset, and tracked two distinguishable datapoints.
Dataset Splits	No	The paper describes using the full MNIST training set or specific subsets for experiments, but it does not explicitly specify train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU specifications, or memory details.
Software Dependencies	No	For all experiments we used the open-source library Py Torch. This mentions a software component but does not provide a specific version number, which is required for reproducibility.
Experiment Setup	Yes	The hyperparameters used can be found in Table 2 (middle). For all experiments we used the open-source library Py Torch. We chose stochastic gradient descent as an optimizer, as it was used for the theory derivation. All models are initialized using the Xavier normal initialization with gain parameter chosen to display rich learning behavior. Each layer has biases and these are initialized at zero. Learning rates are chosen to produce smooth loss curves while still converging within the 6000 epochs. The different hyperparameters can be found in Table 1.