Modular Lifelong Reinforcement Learning via Neural Composition

Authors: Jorge A Mendez, Harm van Seijen, ERIC EATON

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that neural composition indeed captures the underlying structure of this space of problems. We further propose a compositional lifelong RL method that leverages accumulated neural components to accelerate the learning of future tasks while retaining performance on previous tasks via off-line RL over replayed experiences.
Researcher Affiliation Collaboration Jorge A. Mendez1 , Harm van Seijen2, and Eric Eaton1 1Department of Computer and Information Science University of Pennsylvania {mendezme,eeaton}@seas.upenn.edu 2Microsoft Research harm.vanseijen@microsoft.com
Pseudocode Yes Algorithm 1 Lifelong Compositional RL
Open Source Code Yes source code is available at: github.com/Lifelong-ML/Mendez2022Modular Lifelong RL.
Open Datasets Yes Tasks were simulated on gym-minigrid (Chevalier-Boisvert et al., 2018). [...] All dynamics are simulated in robosuite (Zhu et al., 2020).
Dataset Splits No The paper describes hyperparameter tuning and evaluation on tasks, but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification Yes Our discrete 2-D experiments were carried out on two small development machines, each with two Ge Force R GTX 1080 Ti GPUs, eight-core Intel R Core TM i7-7700K CPUs, and 64GB of RAM.
Software Dependencies No The paper mentions software like PPO and BCQ but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We tuned the STL hyper-parameters via grid-search over the learning rate (from {1e 6, 3e 6, 1e 5, 3e 5, 1e 4, 3e 4, 1e 3, 3e 3, 1e 2, 3e 2}) and the number of environment interactions per training step (from {256, 512, 1024, 2048, 4096, 8192}). [...] Table D.1 summarizes the obtained hyper-parameters.