The interplay between randomness and structure during learning in RNNs
Authors: Friedrich Schuessler, Francesca Mastrogiuseppe, Alexis Dubreuil, Srdjan Ostojic, Omri Barak
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we examine RNNs trained using gradient descent on different tasks inspired by the neuroscience literature. We find that the changes in recurrent connectivity can be described by low-rank matrices, despite the unconstrained nature of the learning algorithm. |
| Researcher Affiliation | Academia | Friedrich Schuessler Technion schuessler@campus.technion.ac.il Francesca Mastrogiuseppe Gatsby Unit, UCL f.mastrogiuseppe@ucl.ac.uk Alexis Dubreuil ENS Paris alexis.dubreuil@gmail.com Srdjan Ostojic ENS Paris srdjan.ostojic@ens.fr Omri Barak Technion omri.barak@gmail.com |
| Pseudocode | No | The paper describes methods through narrative text and equations but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We therefore trained a two-layer LSTM network on a natural language processing task, sentiment analysis of movie reviews [30] (details in supplementary). |
| Dataset Splits | No | The paper refers to 'Details can be found in the supplementary' for task parameters, and discusses training loss, but does not explicitly state train/validation/test splits (e.g., percentages or sample counts) in the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like 'Adam [15]' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | For training the RNNs, we formulated a quadratic cost in zi(t) and applied the gradient descent method Adam [15] to the internal connectivity W as well as to the input and output vectors mi, wi. The initial input and output vectors were drawn independently from N(0, 1/N). We initialized the internal weights as a random matrix W0 with independent elements drawn from N(0, g2/N). The parameter g thus scales the strength of the initial connectivity. For the simulation, we chose N to be large enough so that learning dynamics become invariant under changes in N (see supplementary Fig. S1). Parameters: N = 256, learning rate η = 0.05/N. |