Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Authors: Clementine Domine, Nicolas Anguita, Alexandra M Proca, Lukas Braun, Daniel Kunin, Pedro Mediano, Andrew Saxe

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: A minimal model of the rich and lazy regimes. B. Network output for an example task over training time... Solid lines represent simulations, while dotted lines indicate the analytical solutions derived in this work. We derive explicit solutions for the gradient flow, internal representational similarity, and finite-width NTK in unequal-input-output two-layer deep linear networks... Implementation and simulation. One issue with the expression we derived in Theorem 4.3 is that it can be numerically unstable when simulating it for long time t 0...
Researcher Affiliation Academia 1 Gatsby Computational Neuroscience Unit, University College London, UK 2 Department of Computing, Imperial College London, UK 3 Department of Experimental Psychology, University of Oxford, UK 4 Institute for Computational and Mathematical Engineering, Stanford University, USA 5 Division of Psychology and Language Sciences, University College London, UK 6 Sainsbury Wellcome Centre, University College London, UK 7 CIFAR Azrieli Global Scholar, CIFAR, Toronto, Canada
Pseudocode Yes Algorithm 1 Get λ-balanced
Open Source Code No The paper does not provide an explicit statement of code release, a link to a repository for the described methodology, or indicate that code is in supplementary materials.
Open Datasets Yes We use the same task as in Braun et al. (2022) and modify it to match the theoretical dynamics... In the semantic hierarchy task, input items are represented as one-hot vectors, i.e., X = I8. The corresponding target vectors, yi, encode the item s position within the hierarchical tree... The labels for all objects in the semantic tree, as shown in Figure 4 A, are given by: [matrix of labels]
Dataset Splits No The paper mentions 'full batch gradient descent' and 'batch size is N = 10' in simulation details, but does not specify any training/test/validation dataset splits or splitting methodology.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or detailed computer specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The regression task parameters were set with (σ = 10). The network architecture consisted of Ni = 3, Nh = 2, No = 2,with a learning rate of η = 0.0002. The batch size is N = 10. The zero-balanced weights are initialized with variance σ = 0.00001.