Automatic Data Augmentation for Generalization in Reinforcement Learning

Authors: Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves a new state-of-the-art1on the Procgen benchmark and outperforms popular RL algorithms on Deep Mind Control tasks with distractors. We evaluate these approaches on the Procgen generalization benchmark [12] which consists of 16 procedurally generated environments with visual observations. Table 1 shows train and test performance on Procgen.
Researcher Affiliation Collaboration Roberta Raileanu New York University raileanu@cs.nyu.edu Max Goldstein New York University mag1038@nyu.edu Denis Yarats New York University Facebook AI Research denisyarats@cs.nyu.edu Ilya Kostrikov New York University kostrikov@cs.nyu.edu Rob Fergus New York University fergus@cs.nyu.edu
Pseudocode Yes Algorithm 1 Dr AC: Data-regularized Actor-Critic applied to PPO Black: unmodified actor-critic algorithm. Cyan: image transformation. Red: policy regularization. Blue: value function regularization.
Open Source Code Yes Our code is available at https://github.com/rraileanu/auto-drac.
Open Datasets Yes In practice, we use the Procgen benchmark which contains 16 procedurally generated games. Each game corresponds to a distribution of POMDPs q(m), and each level of a game corresponds to a POMDP sampled from that game s distribution m q. The POMDP m is determined by the seed (i.e. integer) used to generate the corresponding level. Following the setup from Cobbe et al. [12], agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated by sampling seeds uniformly at random from all computer integers). We use four tasks, namely Cartpole Balance, Finger Spin, Walker Walk, and Cheetah Run, in three settings with different types of backgrounds, namely the default, simple distractors, and natural videos from the Kinetics dataset [35], as introduced in Zhang et al. [73].
Dataset Splits No The paper specifies training on '200 levels' and testing on 'the full distribution of levels' for Procgen, but does not explicitly define a separate validation dataset split.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No While the paper mentions the use of specific algorithms and libraries like PPO, SAC, A3C, IMPALA, RAD's implementation, and the 'higher' library, it does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, CUDA versions) that would be needed for exact replication.
Experiment Setup Yes More details about our experimental setup and hyperparameters can be found in Appendix E.