reproducibilityindex.ai

Modular Lifelong Reinforcement Learning via Neural Composition

Authors: Jorge A Mendez, Harm van Seijen, ERIC EATON

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that neural composition indeed captures the underlying structure of this space of problems. We further propose a compositional lifelong RL method that leverages accumulated neural components to accelerate the learning of future tasks while retaining performance on previous tasks via off-line RL over replayed experiences.
Researcher Affiliation	Collaboration	Jorge A. Mendez1 , Harm van Seijen2, and Eric Eaton1 1Department of Computer and Information Science University of Pennsylvania {mendezme,eeaton}@seas.upenn.edu 2Microsoft Research harm.vanseijen@microsoft.com
Pseudocode	Yes	Algorithm 1 Lifelong Compositional RL
Open Source Code	Yes	source code is available at: github.com/Lifelong-ML/Mendez2022Modular Lifelong RL.
Open Datasets	Yes	Tasks were simulated on gym-minigrid (Chevalier-Boisvert et al., 2018). [...] All dynamics are simulated in robosuite (Zhu et al., 2020).
Dataset Splits	No	The paper describes hyperparameter tuning and evaluation on tasks, but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification	Yes	Our discrete 2-D experiments were carried out on two small development machines, each with two Ge Force R GTX 1080 Ti GPUs, eight-core Intel R Core TM i7-7700K CPUs, and 64GB of RAM.
Software Dependencies	No	The paper mentions software like PPO and BCQ but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We tuned the STL hyper-parameters via grid-search over the learning rate (from {1e 6, 3e 6, 1e 5, 3e 5, 1e 4, 3e 4, 1e 3, 3e 3, 1e 2, 3e 2}) and the number of environment interactions per training step (from {256, 512, 1024, 2048, 4096, 8192}). [...] Table D.1 summarizes the obtained hyper-parameters.