Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Authors: Clementine Domine, Nicolas Anguita, Alexandra M Proca, Lukas Braun, Daniel Kunin, Pedro Mediano, Andrew Saxe
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: A minimal model of the rich and lazy regimes. B. Network output for an example task over training time... Solid lines represent simulations, while dotted lines indicate the analytical solutions derived in this work. We derive explicit solutions for the gradient flow, internal representational similarity, and finite-width NTK in unequal-input-output two-layer deep linear networks... Implementation and simulation. One issue with the expression we derived in Theorem 4.3 is that it can be numerically unstable when simulating it for long time t 0... |
| Researcher Affiliation | Academia | 1 Gatsby Computational Neuroscience Unit, University College London, UK 2 Department of Computing, Imperial College London, UK 3 Department of Experimental Psychology, University of Oxford, UK 4 Institute for Computational and Mathematical Engineering, Stanford University, USA 5 Division of Psychology and Language Sciences, University College London, UK 6 Sainsbury Wellcome Centre, University College London, UK 7 CIFAR Azrieli Global Scholar, CIFAR, Toronto, Canada |
| Pseudocode | Yes | Algorithm 1 Get λ-balanced |
| Open Source Code | No | The paper does not provide an explicit statement of code release, a link to a repository for the described methodology, or indicate that code is in supplementary materials. |
| Open Datasets | Yes | We use the same task as in Braun et al. (2022) and modify it to match the theoretical dynamics... In the semantic hierarchy task, input items are represented as one-hot vectors, i.e., X = I8. The corresponding target vectors, yi, encode the item s position within the hierarchical tree... The labels for all objects in the semantic tree, as shown in Figure 4 A, are given by: [matrix of labels] |
| Dataset Splits | No | The paper mentions 'full batch gradient descent' and 'batch size is N = 10' in simulation details, but does not specify any training/test/validation dataset splits or splitting methodology. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The regression task parameters were set with (σ = 10). The network architecture consisted of Ni = 3, Nh = 2, No = 2,with a learning rate of η = 0.0002. The batch size is N = 10. The zero-balanced weights are initialized with variance σ = 0.00001. |