Exact learning dynamics of deep linear networks with prior knowledge

Authors: Lukas Braun, Clémentine Dominé, James Fitzgerald, Andrew Saxe

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu s matrix Riccati solution [1]. We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
Researcher Affiliation Academia 1. Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom 2. Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom 3. Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, USA 4. Sainsbury Wellcome Centre, University College London, London, United Kingdom 5. CIFAR Azrieli Global Scholar, CIFAR, Toronto, Canada
Pseudocode No The paper does not include any sections or figures explicitly labeled as "Pseudocode" or "Algorithm" containing structured steps for a method.
Open Source Code Yes Code to replicate all simulations and plots are available online1 under a GPLv3 license and requires <6 hours to execute on a single AMD Ryzen 5950x. 1https://github.com/saxelab/deep-linear-networks-with-prior-knowledge
Open Datasets No The paper describes a "supervised learning task in which input vectors xn RNi from a set of P training pairs {(xn, yn)}n=1...P have to be associated with their target output vectors yn RNo" and later refers to a "semantic learning task" in Section 4. However, it does not provide concrete access information (link, DOI, specific citation with author/year for public access) for this dataset, nor does it name a well-known public dataset.
Dataset Splits No The paper does not provide specific details on validation dataset splits, such as percentages, sample counts, or explicit cross-validation setups.
Hardware Specification Yes Code to replicate all simulations and plots are available online1 under a GPLv3 license and requires <6 hours to execute on a single AMD Ryzen 5950x.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that were used for the experiments. It only mentions that code is available online.
Experiment Setup Yes Simulation details are in Appendix H. Code to replicate all simulations and plots are available online1 under a GPLv3 license and requires <6 hours to execute on a single AMD Ryzen 5950x.