Reinforcement Learning with Fast and Forgetful Memory
Authors: Steven Morad, Ryan Kortvelesy, Stephan Liwicki, Amanda Prorok
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.We evaluate FFM on the two largest POMDP benchmarks currently available: POPGym [Morad et al., 2023] and the POMDP tasks from [Ni et al., 2022], which we henceforth refer to as POMDPBaselines. For POPGym, we train a shared-memory actor-critic model using recurrent using Proximal Policy Optimization [Schulman et al., 2017]. For POMDP-Baselines, we train separate memory models for the actor and critic using recurrent Soft Actor Critic (SAC) [Haarnoja et al., 2018] and recurrent Twin Delayed DDPG (TD3) [Fujimoto et al., 2018]. |
| Researcher Affiliation | Collaboration | Steven Morad University of Cambridge Cambridge, UK sm2558@cam.ac.uk Ryan Kortvelesy University of Cambridge Cambridge, UK rk627@cam.ac.uk Stephan Liwicki Toshiba Europe Ltd. Cambridge, UK stephan.liwicki@toshiba.eu Amanda Prorok University of Cambridge Cambridge, UK asp45@cam.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/proroklab/ffm. |
| Open Datasets | Yes | We evaluate FFM on the two largest POMDP benchmarks currently available: POPGym [Morad et al., 2023] and the POMDP tasks from [Ni et al., 2022], which we henceforth refer to as POMDPBaselines. |
| Dataset Splits | No | The paper evaluates on existing benchmarks (POPGym and POMDP-Baselines) but does not explicitly state the train/validation/test splits used. |
| Hardware Specification | Yes | We trained on a server with a Xeon CPU running Torch version 1.13 with CUDA version 11.7, with consistent access to two 2080Ti GPUs. |
| Software Dependencies | Yes | We trained on a server with a Xeon CPU running Torch version 1.13 with CUDA version 11.7 |
| Experiment Setup | Yes | We replicate the experiments from the POPGym and POMDP-Baselines papers as-is without changing any hyperparameters. We use a single FFM configuration across all experiments, except for varying the hidden and recurrent sizes to match the RNNs, and initialize α, ω based on the max sequence length. See Appendix D for further details.For fairness, We let m = 32 and c = 4, which results in a complex recurrent state of 128, which can be represented as a 256 dimensional real vector. We initialize α, ω following Appendix B for te = 1024, β = 0.01.We let c = h/32 and m = h/c, so that mc = h. ... We utilize separate memory modules for the actor and critic, as done in the paper. |