Decoupling Value and Policy for Generalization in Reinforcement Learning

Authors: Roberta Raileanu, Rob Fergus

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental IDAAC shows good generalization to unseen environments, achieving a new state-of-the-art on the Procgen benchmark and outperforming popular methods on Deep Mind Control tasks with distractors.
Researcher Affiliation Academia Roberta Raileanu 1 Rob Fergus 1 1Deptartment of Computer Science, New York University, New York, USA. Correspondence to: Roberta Raileanu <raileanu@cs.nyu.edu>.
Pseudocode Yes See Algorithm 1 from Appendix B for a more detailed description of DAAC. See Algorithm 2 from Appendix B for a more detailed description of IDAAC.
Open Source Code Yes Our implementation is available at https://github.com/ rraileanu/idaac.
Open Datasets Yes In practice, we use the Procgen benchmark which contains 16 procedurally generated games. ... We use three tasks, namely Cartpole Balance, Cartpole Swingup, and Ball In Cup.
Dataset Splits No Following the setup from Cobbe et al. (2019), agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated using any computer integer seed). (No explicit mention of a validation split).
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions software like Adam and refers to 'Pytorch implementations of reinforcement learning algorithms' but does not specify version numbers for any software dependencies.
Experiment Setup Yes More details about our experimental setup and hyperparameters can be found in Appendix C.