Automatic Data Augmentation for Generalization in Reinforcement Learning
Authors: Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves a new state-of-the-art1on the Procgen benchmark and outperforms popular RL algorithms on Deep Mind Control tasks with distractors. We evaluate these approaches on the Procgen generalization benchmark [12] which consists of 16 procedurally generated environments with visual observations. Table 1 shows train and test performance on Procgen. |
| Researcher Affiliation | Collaboration | Roberta Raileanu New York University raileanu@cs.nyu.edu Max Goldstein New York University mag1038@nyu.edu Denis Yarats New York University Facebook AI Research denisyarats@cs.nyu.edu Ilya Kostrikov New York University kostrikov@cs.nyu.edu Rob Fergus New York University fergus@cs.nyu.edu |
| Pseudocode | Yes | Algorithm 1 Dr AC: Data-regularized Actor-Critic applied to PPO Black: unmodified actor-critic algorithm. Cyan: image transformation. Red: policy regularization. Blue: value function regularization. |
| Open Source Code | Yes | Our code is available at https://github.com/rraileanu/auto-drac. |
| Open Datasets | Yes | In practice, we use the Procgen benchmark which contains 16 procedurally generated games. Each game corresponds to a distribution of POMDPs q(m), and each level of a game corresponds to a POMDP sampled from that game s distribution m q. The POMDP m is determined by the seed (i.e. integer) used to generate the corresponding level. Following the setup from Cobbe et al. [12], agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated by sampling seeds uniformly at random from all computer integers). We use four tasks, namely Cartpole Balance, Finger Spin, Walker Walk, and Cheetah Run, in three settings with different types of backgrounds, namely the default, simple distractors, and natural videos from the Kinetics dataset [35], as introduced in Zhang et al. [73]. |
| Dataset Splits | No | The paper specifies training on '200 levels' and testing on 'the full distribution of levels' for Procgen, but does not explicitly define a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | While the paper mentions the use of specific algorithms and libraries like PPO, SAC, A3C, IMPALA, RAD's implementation, and the 'higher' library, it does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, CUDA versions) that would be needed for exact replication. |
| Experiment Setup | Yes | More details about our experimental setup and hyperparameters can be found in Appendix E. |