Modular Lifelong Reinforcement Learning via Neural Composition
Authors: Jorge A Mendez, Harm van Seijen, ERIC EATON
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that neural composition indeed captures the underlying structure of this space of problems. We further propose a compositional lifelong RL method that leverages accumulated neural components to accelerate the learning of future tasks while retaining performance on previous tasks via off-line RL over replayed experiences. |
| Researcher Affiliation | Collaboration | Jorge A. Mendez1 , Harm van Seijen2, and Eric Eaton1 1Department of Computer and Information Science University of Pennsylvania {mendezme,eeaton}@seas.upenn.edu 2Microsoft Research harm.vanseijen@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Lifelong Compositional RL |
| Open Source Code | Yes | source code is available at: github.com/Lifelong-ML/Mendez2022Modular Lifelong RL. |
| Open Datasets | Yes | Tasks were simulated on gym-minigrid (Chevalier-Boisvert et al., 2018). [...] All dynamics are simulated in robosuite (Zhu et al., 2020). |
| Dataset Splits | No | The paper describes hyperparameter tuning and evaluation on tasks, but does not specify explicit train/validation/test dataset splits with percentages or counts for reproducibility. |
| Hardware Specification | Yes | Our discrete 2-D experiments were carried out on two small development machines, each with two Ge Force R GTX 1080 Ti GPUs, eight-core Intel R Core TM i7-7700K CPUs, and 64GB of RAM. |
| Software Dependencies | No | The paper mentions software like PPO and BCQ but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We tuned the STL hyper-parameters via grid-search over the learning rate (from {1e 6, 3e 6, 1e 5, 3e 5, 1e 4, 3e 4, 1e 3, 3e 3, 1e 2, 3e 2}) and the number of environment interactions per training step (from {256, 512, 1024, 2048, 4096, 8192}). [...] Table D.1 summarizes the obtained hyper-parameters. |