Feedback control guides credit assignment in recurrent neural networks

Authors: Klara Kaleb, Barbara Feulner, Juan Gallego, Claudia Clopath

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we investigate the mechanistic properties of such recurrent networks pre-trained with feedback control on a stereotypical motor task. Our key findings are listed as follows: 1. Feedback control allows for approximate learning in the activity space 3.2. 2. Feedback control enables increased accuracy of approximate, local learning rules in the recurrent layer due to the decoupling of the network from its past activity 3.4. 3. Feedback control enables more efficient weight updates during task adaptation due to the implicit incorporation of adaptive, second-order gradient into the network dynamics 3.5. ... We empirically validate this hypothesis by comparing the gradients of the local learning rule with that of online BPTT during task adaptation, with (RFLO+c) and without (RFLO) feedback control.
Researcher Affiliation Academia Klara Kaleb Department of Bioengineering Imperial College London London, UK klara.kaleb18@imperial.ac.uk Barbara Feulner Department of Bioengineering Imperial College London London, UK Juan A. Gallego Department of Bioengineering Imperial College London London, UK Claudia Clopath Department of Bioengineering Imperial College London London, UK
Pseudocode No The paper provides equations and describes procedures in text, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code, adapted from (Feulner et al., 2022), is available at: https://github.com/klarakaleb/feedback-control.
Open Datasets No The paper describes a 'synthetic instructed delay centre-out reaching task' where data is generated by randomly sampling coordinates and interpolating velocity trajectories. This is a generative process described in the paper, not a link to a pre-existing publicly available dataset.
Dataset Splits No The paper describes pre-training for '5000 independently drawn data batches i.e. trials' and fine-tuning using 'batch size of 1', indicating continuous generation and learning on trials rather than fixed train/validation/test dataset splits.
Hardware Specification Yes We ran all our experiments using NVIDIA Ge Force GPUs (RTX 2080 Ti).
Software Dependencies No The paper mentions optimizers like Adam and SGD and software used (e.g., code adapted from Feulner et al.), but it does not provide specific version numbers for software components (e.g., 'Python 3.x', 'PyTorch 1.x').
Experiment Setup Yes We pre-train the network parameters using the Adam optimizer (Kingma & Ba, 2014) with learning rate η1 = 0.001(β1 = 0.9, β2 = 0.999), batch size of 256 and L2 regularization for network parameters (β = 1e 3) and activations (γ = 2e 3) for 5000 independently drawn data batches i.e. trials. The total norm of the gradients is clipped to 0.2. ... The fine-tuning learning rates η2 used for each learning algorithm are: 1e 4 (BPTT+c), 5e 6 (BPTT) and 5e 5 (RFLO, RFLO+c).