reproducibilityindex.ai

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules

Authors: Yuhan Helena Liu, Arna Ghosh, Blake Richards, Eric Shea-Brown, Guillaume Lajoie

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We ﬁrst demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated signiﬁcantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties.
Researcher Affiliation	Collaboration	Yuhan Helena Liu1,2,3,, Arna Ghosh4,5, Blake A. Richards4,5,6,7, Eric Shea-Brown1,2,3, and Guillaume Lajoie5,7,8, 1Department of Applied Mathematics, University of Washington, Seattle, WA, USA 2Allen Institute for Brain Science, 615 Westlake Ave N, Seattle WA, USA 3Computational Neuroscience Center, University of Washington, Seattle, WA, USA 4School of Computer Science, Mc Gill University, Montreal, QC, Canada 5Mila Quebec AI Institute, Montreal, QC, Canada 6Department of Neurology and Neurosurgery, Montreal Neurological Institute, Mc Gill University, Montreal, QC, Canada 7Canada CIFAR AI Chair, CIFAR, Toronto, ON, Canada 8Dept. de Mathématiques et Statistiques, Université de Montréal, Montreal, QC, Canada *Correspondence: hyliu24@uw.edu, g.lajoie@umontreal.ca
Pseudocode	No	The paper describes mathematical equations and theoretical concepts but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Anonymized code link is provided in Appendix A.3.
Open Datasets	Yes	We performed experiments on three tasks: sequential MNIST [137], pattern generation [138] and delayed match-to-sample tasks [139]. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
Dataset Splits	No	The paper mentions training and test accuracy, but does not explicitly detail training, validation, and test dataset splits with percentages or sample counts. It refers to 'Appendix A.3' for training details, but the main text does not contain this specific information.
Hardware Specification	No	The paper states: 'Information pertaining to computing resources and simulation time can be found in Appendix A.3.' However, Appendix A.3 is not provided, and the main text does not contain specific hardware details like GPU/CPU models or memory.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies. It only mentions 'TensorFlow' in the references: '{Tensor Flow}: A system for {Large-Scale} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265 283, 2016.'
Experiment Setup	Yes	The detailed governing equations of our setup can be found in Methods (Appendix A). We consider a RNN with Nin input units, N hidden units and Nout readout units (Figure 1A). We veriﬁed that trends hold for different network sizes and refer the reader to Appendix A.3 for more details. The update formula for ht RN (the hidden state at time t) is governed by: ht+1 = φ(Whf(ht), Wxxt), (1) where φ( ) : RN RN is the hidden state update function, f( ) : RN RN is the activation function, Wh RN N (resp. Wx RNin N) is the recurrent (resp. input) weight matrix and x RNin is the input. For φ, we consider a discrete-time implementation of a rate-based recurrent neural network (RNN) similar to the form in [136] (details in Appendix A). Readout ˆy RNout, with readout weights w RNout N, is deﬁned as ˆy = w, f(ht) . (2) We performed experiments on three tasks: sequential MNIST [137], pattern generation [138] and delayed match-to-sample tasks [139]. The objective is to minimize scalar loss L R, which is deﬁned as ... Different learning algorithms examined in this work are BPTT (our benchmark), which update weights by computing the exact gradient ( L(Wh) RN N): Wh = η L(Wh), (4) and three So TA bio-plausible learning rules that update weights using approximate gradient: [ Wh = η L(Wh), (5) where L(Wh) RN N denotes a gradient approximation and η R denotes the learning rate.