Decoupling Value and Policy for Generalization in Reinforcement Learning
Authors: Roberta Raileanu, Rob Fergus
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | IDAAC shows good generalization to unseen environments, achieving a new state-of-the-art on the Procgen benchmark and outperforming popular methods on Deep Mind Control tasks with distractors. |
| Researcher Affiliation | Academia | Roberta Raileanu 1 Rob Fergus 1 1Deptartment of Computer Science, New York University, New York, USA. Correspondence to: Roberta Raileanu <raileanu@cs.nyu.edu>. |
| Pseudocode | Yes | See Algorithm 1 from Appendix B for a more detailed description of DAAC. See Algorithm 2 from Appendix B for a more detailed description of IDAAC. |
| Open Source Code | Yes | Our implementation is available at https://github.com/ rraileanu/idaac. |
| Open Datasets | Yes | In practice, we use the Procgen benchmark which contains 16 procedurally generated games. ... We use three tasks, namely Cartpole Balance, Cartpole Swingup, and Ball In Cup. |
| Dataset Splits | No | Following the setup from Cobbe et al. (2019), agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated using any computer integer seed). (No explicit mention of a validation split). |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Adam and refers to 'Pytorch implementations of reinforcement learning algorithms' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | More details about our experimental setup and hyperparameters can be found in Appendix C. |