Reinforcement Learning with Augmented Data
Authors: Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the Deep Mind Control Suite benchmark for pixel-based control as well as Open AI Gym benchmark for state-based control. |
| Researcher Affiliation | Academia | Michael Laskin UC Berkeley, Kimin Lee UC Berkeley, Adam Stooke UC Berkeley, Lerrel Pinto New York University, Pieter Abbeel UC Berkeley, Aravind Srinivas UC Berkeley |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our RAD module and training code are available at https://www.github.com/Misha Laskin/rad. |
| Open Datasets | Yes | To this end, we utilize the Deep Mind Control Suite (DMControl) [22]... For DMControl experiments, we evaluate the data-efficiency by measuring the performance of our method at 100k ... and 500k ... simulator or environment steps. ... For this reason, we focus on the Open AI Proc Gen benchmarks [24] to investigate the generalization capabilities of RAD. ... For Open AI Gym experiments with proprioceptive inputs..., we compare to PETS [41]... |
| Dataset Splits | No | The paper specifies training steps (e.g., 100k, 500k environment steps) and test environments, but does not describe distinct training/validation/test dataset splits in the conventional sense for supervised learning, as is typical for reinforcement learning setups where agents interact with an environment rather than a fixed dataset split for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models or CPU specifications. |
| Software Dependencies | No | The paper mentions algorithms and frameworks like SAC and PPO, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | A full list of hyperparameters is provided in Table 4 of Appendix E. |