reproducibilityindex.ai

Recurrent Reinforcement Learning with Memoroids

Authors: Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments utilize tasks from the POPGym benchmark (Morad et al., 2023a), and all TBB to SBB comparisons use identical hyperparameters and random seeds. We validate our findings across Simplified State Space Models (S5), Linear Recurrent Units (LRU), Fast and Forgetful Memory (FFM), and the Linear Transformer (Lin Attn) memoroids. We train our policies using Double Dueling DQN (Van Hasselt et al., 2016; Wang et al., 2016).
Researcher Affiliation	Collaboration	Steven Morad1,2, Chris Lu3, Ryan Kortvelesy2, Stephan Liwicki4, Jakob Foerster3, Amanda Prorok2 1Faculty of Science and Technology, University of Macau, China 2Computer Science and Technology, University of Cambridge, UK 3Engineering Science, University of Oxford, UK 4Toshiba Europe, UK
Pseudocode	Yes	Algorithm 1 Inserting transitions using TBB
Open Source Code	Yes	The code necessary to reproduce all of our experiments is available at https://github.com/proroklab/memory-monoids.
Open Datasets	Yes	Our experiments utilize tasks from the POPGym benchmark (Morad et al., 2023a)
Dataset Splits	No	The paper uses the POPGym benchmark and mentions training epochs and evaluation return but does not explicitly provide training/validation/test dataset splits with specific percentages or counts.
Hardware Specification	Yes	For both experiments, we evaluate ten random seeds on a RTX 2080Ti GPU.
Software Dependencies	No	Reformulating the discounted return and GAE targets as memoroids enables us to compute them in a GPU-efficient fashion, using a high-level framework like JAX (Bradbury et al., 2018).
Experiment Setup	Yes	Task Epochs Rand, Train Polyak τ Batch Size LR Ratio Clip γ Repeat First 5,000, 5,000 0.995 1,000 0.0001 1:1 0.01 0.99 ... (and further rows for other tasks)