reproducibilityindex.ai

Short-Term Plasticity Neurons Learning to Learn and Forget

Authors: Hector Garcia Rodriguez, Qinghai Guo, Timoleon Moraitis

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we present a new type of recurrent neural unit, the STP Neuron (STPN), which indeed turns out strikingly powerful. Its key mechanism is that synapses have a state, propagated through time by a self-recurrent connection-within-the-synapse. This formulation enables training the plasticity with backpropagation through time, resulting in a form of learning to learn and forget in the short term. The STPN outperforms all tested alternatives, i.e. RNNs, LSTMs, other models with fast weights, and differentiable plasticity. We conﬁrm this in both supervised and reinforcement learning (RL), and in tasks such as Associative Retrieval, Maze Exploration, Atari video games, and Mu Jo Co robotics.
Researcher Affiliation	Collaboration	1Huawei Technologies Zurich Research Center, Switzerland 2University College London, United Kingdom 3Advanced Computing & Storage Lab, Huawei Technologies, Shenzhen, China.
Pseudocode	Yes	Algorithm 1 STPN learning to learn and forget in a supervised meta-learning setting
Open Source Code	Yes	Code is available at https://github.com/ Neuromorphic Computing/stpn.
Open Datasets	Yes	Associative Retrieval Task (ART) (Ba et al., 2016); Maze Exporation: Maze or grid-like tasks have been commonly used in RL... (Miconi et al., 2018); Atari games and Mu Joco simulated robotics: ...Atari Pong and Mu Jo Co Inverted Pendulum. Pong is an Atari 2600 game implemented in the Arcade Learning Environment (ALE) (Bellemare et al., 2013); Mu Jo Co (Todorov et al., 2012) is a physics engine widely used for research in robotics and reinforcement learning. Inverted Pendulum is one of the simplest tasks within Mu Jo Co.
Dataset Splits	Yes	Fig. 2 shows that the STPN is more proﬁcient, i.e. obtains larger validation accuracy and reward, than all other baselines.; For the dataset mode, this means accuracy on a validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions 'hypothetical analog neuromorphic hardware' in the context of energy consumption measurement of the model, not the actual experimental setup.
Software Dependencies	No	The paper mentions using 'RLLib (Liang et al., 2018)', 'A2C... A3C (Mnih et al., 2016)', and 'Proximal Policy Optimization (PPO) algorithm (Schulman et al., 2017)', 'Mu Jo Co (Todorov et al., 2012)', but does not specify version numbers for these software components.
Experiment Setup	Yes	We tune rollout length (50), gradient clipping (40), discount factor (0.99) in shorter runs (which both models share in the displayed results); and additionally tune initial learning rate for the ﬁnal longer runs (0.0007 and 0.0001 respectively), using a linear decay learning rate schedule ﬁnishing at 10 11 at 200 million iterations. Models are trained from the experienced collected by 64 parallel agents.; We only increase the batch size (number of agents acting in parallel ) from 16 (in the code, not mentioned in the article) to 512 to maximize computational efﬁciency of gradient updates.