The interplay between randomness and structure during learning in RNNs

Authors: Friedrich Schuessler, Francesca Mastrogiuseppe, Alexis Dubreuil, Srdjan Ostojic, Omri Barak

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we examine RNNs trained using gradient descent on different tasks inspired by the neuroscience literature. We find that the changes in recurrent connectivity can be described by low-rank matrices, despite the unconstrained nature of the learning algorithm.
Researcher Affiliation Academia Friedrich Schuessler Technion schuessler@campus.technion.ac.il Francesca Mastrogiuseppe Gatsby Unit, UCL f.mastrogiuseppe@ucl.ac.uk Alexis Dubreuil ENS Paris alexis.dubreuil@gmail.com Srdjan Ostojic ENS Paris srdjan.ostojic@ens.fr Omri Barak Technion omri.barak@gmail.com
Pseudocode No The paper describes methods through narrative text and equations but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We therefore trained a two-layer LSTM network on a natural language processing task, sentiment analysis of movie reviews [30] (details in supplementary).
Dataset Splits No The paper refers to 'Details can be found in the supplementary' for task parameters, and discusses training loss, but does not explicitly state train/validation/test splits (e.g., percentages or sample counts) in the main text.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms like 'Adam [15]' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions).
Experiment Setup Yes For training the RNNs, we formulated a quadratic cost in zi(t) and applied the gradient descent method Adam [15] to the internal connectivity W as well as to the input and output vectors mi, wi. The initial input and output vectors were drawn independently from N(0, 1/N). We initialized the internal weights as a random matrix W0 with independent elements drawn from N(0, g2/N). The parameter g thus scales the strength of the initial connectivity. For the simulation, we chose N to be large enough so that learning dynamics become invariant under changes in N (see supplementary Fig. S1). Parameters: N = 256, learning rate η = 0.05/N.