Learning dynamics of deep linear networks with multiple pathways

Authors: Jianghong Shi, Eric Shea-Brown, Michael Buice

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This result is derived analytically and demonstrated with numerical simulation with both linear and non-linear networks. We demonstrate our results with numerical simulations of networks with two pathways and multiple depths.
Researcher Affiliation Collaboration Jianghong Shi Department of Applied Mathematics University of Washington Seattle, WA 98195 jhshi@uw.edu Eric Shea-Brown Department of Applied Mathematics University of Washington Seattle, WA 98195 etsb@uw.edu Michael A. Buice Allen Institute Mind Scope Program Seattle, WA 9109 michaelbu@alleninstitute.org
Pseudocode No The paper describes mathematical derivations and simulation procedures through narrative and equations, but it does not include any explicitly labeled pseudocode or algorithm blocks, nor any structured, code-like steps.
Open Source Code Yes Code for simulations and figures is available at https://github.com/Allen Institute/ Multipathway_Neur IPS2022.
Open Datasets No The input vectors x are 8-dimensional and are the rows of the 8-dimensional identity matrix. The output vectors y are 15-dimensional and are the rows of the matrix: [matrix provided in paper] (The paper defines the data used for training internally rather than citing or linking to an external public dataset.)
Dataset Splits No We train the network with a set of P examples {xi, yi}, i = 1, 2, . . . , P with gradient descent on the squared loss. (The paper does not provide explicit details regarding train/validation/test splits, such as percentages or sample counts, nor does it refer to predefined splits from known datasets.)
Hardware Specification No These simulations are not compute intensive and are easily performed on a standard modern desktop or laptop. (This statement is too general and does not provide specific hardware details such as CPU/GPU models, memory, or cloud resources used.)
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their exact versions) that were used for the experiments.
Experiment Setup Yes For these examples we use the same number of layers per pathway and N1 = N2 = 1000. The initial state of the weight matrices is drawn from a zero mean normal distribution with a fixed standard deviation σ = 0.01. Gradient descent is performed over 1000 epochs with learning rate lr = 0.01.