reproducibilityindex.ai

Reinforcement Learning with Fast and Forgetful Memory

Authors: Steven Morad, Ryan Kortvelesy, Stephan Liwicki, Amanda Prorok

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.We evaluate FFM on the two largest POMDP benchmarks currently available: POPGym [Morad et al., 2023] and the POMDP tasks from [Ni et al., 2022], which we henceforth refer to as POMDPBaselines. For POPGym, we train a shared-memory actor-critic model using recurrent using Proximal Policy Optimization [Schulman et al., 2017]. For POMDP-Baselines, we train separate memory models for the actor and critic using recurrent Soft Actor Critic (SAC) [Haarnoja et al., 2018] and recurrent Twin Delayed DDPG (TD3) [Fujimoto et al., 2018].
Researcher Affiliation	Collaboration	Steven Morad University of Cambridge Cambridge, UK sm2558@cam.ac.uk Ryan Kortvelesy University of Cambridge Cambridge, UK rk627@cam.ac.uk Stephan Liwicki Toshiba Europe Ltd. Cambridge, UK stephan.liwicki@toshiba.eu Amanda Prorok University of Cambridge Cambridge, UK asp45@cam.ac.uk
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/proroklab/ffm.
Open Datasets	Yes	We evaluate FFM on the two largest POMDP benchmarks currently available: POPGym [Morad et al., 2023] and the POMDP tasks from [Ni et al., 2022], which we henceforth refer to as POMDPBaselines.
Dataset Splits	No	The paper evaluates on existing benchmarks (POPGym and POMDP-Baselines) but does not explicitly state the train/validation/test splits used.
Hardware Specification	Yes	We trained on a server with a Xeon CPU running Torch version 1.13 with CUDA version 11.7, with consistent access to two 2080Ti GPUs.
Software Dependencies	Yes	We trained on a server with a Xeon CPU running Torch version 1.13 with CUDA version 11.7
Experiment Setup	Yes	We replicate the experiments from the POPGym and POMDP-Baselines papers as-is without changing any hyperparameters. We use a single FFM configuration across all experiments, except for varying the hidden and recurrent sizes to match the RNNs, and initialize α, ω based on the max sequence length. See Appendix D for further details.For fairness, We let m = 32 and c = 4, which results in a complex recurrent state of 128, which can be represented as a 256 dimensional real vector. We initialize α, ω following Appendix B for te = 1024, β = 0.01.We let c = h/32 and m = h/c, so that mc = h. ... We utilize separate memory modules for the actor and critic, as done in the paper.