Learning dynamics of deep linear networks with multiple pathways
Authors: Jianghong Shi, Eric Shea-Brown, Michael Buice
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This result is derived analytically and demonstrated with numerical simulation with both linear and non-linear networks. We demonstrate our results with numerical simulations of networks with two pathways and multiple depths. |
| Researcher Affiliation | Collaboration | Jianghong Shi Department of Applied Mathematics University of Washington Seattle, WA 98195 jhshi@uw.edu Eric Shea-Brown Department of Applied Mathematics University of Washington Seattle, WA 98195 etsb@uw.edu Michael A. Buice Allen Institute Mind Scope Program Seattle, WA 9109 michaelbu@alleninstitute.org |
| Pseudocode | No | The paper describes mathematical derivations and simulation procedures through narrative and equations, but it does not include any explicitly labeled pseudocode or algorithm blocks, nor any structured, code-like steps. |
| Open Source Code | Yes | Code for simulations and figures is available at https://github.com/Allen Institute/ Multipathway_Neur IPS2022. |
| Open Datasets | No | The input vectors x are 8-dimensional and are the rows of the 8-dimensional identity matrix. The output vectors y are 15-dimensional and are the rows of the matrix: [matrix provided in paper] (The paper defines the data used for training internally rather than citing or linking to an external public dataset.) |
| Dataset Splits | No | We train the network with a set of P examples {xi, yi}, i = 1, 2, . . . , P with gradient descent on the squared loss. (The paper does not provide explicit details regarding train/validation/test splits, such as percentages or sample counts, nor does it refer to predefined splits from known datasets.) |
| Hardware Specification | No | These simulations are not compute intensive and are easily performed on a standard modern desktop or laptop. (This statement is too general and does not provide specific hardware details such as CPU/GPU models, memory, or cloud resources used.) |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their exact versions) that were used for the experiments. |
| Experiment Setup | Yes | For these examples we use the same number of layers per pathway and N1 = N2 = 1000. The initial state of the weight matrices is drawn from a zero mean normal distribution with a fixed standard deviation σ = 0.01. Gradient descent is performed over 1000 epochs with learning rate lr = 0.01. |