Recurrent Reinforcement Learning with Memoroids

Authors: Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments utilize tasks from the POPGym benchmark (Morad et al., 2023a), and all TBB to SBB comparisons use identical hyperparameters and random seeds. We validate our findings across Simplified State Space Models (S5), Linear Recurrent Units (LRU), Fast and Forgetful Memory (FFM), and the Linear Transformer (Lin Attn) memoroids. We train our policies using Double Dueling DQN (Van Hasselt et al., 2016; Wang et al., 2016).
Researcher Affiliation Collaboration Steven Morad1,2, Chris Lu3, Ryan Kortvelesy2, Stephan Liwicki4, Jakob Foerster3, Amanda Prorok2 1Faculty of Science and Technology, University of Macau, China 2Computer Science and Technology, University of Cambridge, UK 3Engineering Science, University of Oxford, UK 4Toshiba Europe, UK
Pseudocode Yes Algorithm 1 Inserting transitions using TBB
Open Source Code Yes The code necessary to reproduce all of our experiments is available at https://github.com/proroklab/memory-monoids.
Open Datasets Yes Our experiments utilize tasks from the POPGym benchmark (Morad et al., 2023a)
Dataset Splits No The paper uses the POPGym benchmark and mentions training epochs and evaluation return but does not explicitly provide training/validation/test dataset splits with specific percentages or counts.
Hardware Specification Yes For both experiments, we evaluate ten random seeds on a RTX 2080Ti GPU.
Software Dependencies No Reformulating the discounted return and GAE targets as memoroids enables us to compute them in a GPU-efficient fashion, using a high-level framework like JAX (Bradbury et al., 2018).
Experiment Setup Yes Task Epochs Rand, Train Polyak τ Batch Size LR Ratio Clip γ Repeat First 5,000, 5,000 0.995 1,000 0.0001 1:1 0.01 0.99 ... (and further rows for other tasks)