Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Continual Reinforcement Learning with Complex Synapses
Authors: Christos Kaplanis, Murray Shanahan, Claudia Clopath
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that by equipping tabular and deep reinforcement learning agents with a synaptic model that incorporates this biological complexity (Benna & Fusi, 2016), catastrophic forgetting can be mitigated at multiple timescales. In particular, we ļ¬nd that as well as enabling continual learning across sequential training of two simple tasks, it can also be used to overcome within-task forgetting by reducing the need for an experience replay database. |
| Researcher Affiliation | Collaboration | 1Department of Computing, Imperial College London 2Department of Bioengineering, Imperial College London 3Google Deep Mind, London. |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | The version used was Cart Pole-v1 from the Open AI Gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper does not explicitly mention validation dataset splits. It describes criteria for deeming a task (re)learnt during training, but not specific data partitioning for validation. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models) used for the experiments. |
| Software Dependencies | No | The paper mentions algorithms and methods (e.g., Euler method, Adam optimizer, soft Q-learning objective) and cites related work, but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | In our experiments, we simulated the Benna-Fusi ODEs using the Euler method for numerical integration... In a Benna-Fusi chain of length N, C1 g1,2 and CN g N,N+1 determine the shortest and longest memory timescales... we set g1,2 to 10 5 to correspond roughly to the minimum number of Q-learning updates per epoch, and the number of variables in each chain to 3, all of which were initialised to 0. The ODEs were numerically integrated after every Q-learning update with a time step of t = 1. A table of all parameters used for simulation is shown in Supp. Table 1. The control agent was essentially a DQN (Mnih et al., 2015) with two fully connected hidden layers of 400 and 200 Re LUs respectively... The network was trained with the soft Q-learning objective (Haarnoja et al., 2017)... The experience replay database had a size of 2000, from which 64 experiences were sampled for training with Adam (Kingma & Ba, 2014) at the end of every episode. Crucially, the database was cleared at the end of every epoch... The agent was ϵ-greedy with respect to the stochastic soft Q-learning policy and ϵ was decayed from 1 to almost 0 over the course of each epoch. Finally, soft target network updates were used as in (Lillicrap et al., 2015)... A full table of parameters used can be seen in Supp. Table 2. The Benna-Fusi agent was identical to the control agent, except that each network parameter was modelled as a Benna Fusi synapse with 30 variables with g1,2 set to 0.001625... For this reason, the effective ļ¬ow between u1 and u2 was 64 0.001625 = 0.1... |