Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Dr Q-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. Dr Q-v2 builds on Dr Q, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the Deep Mind Control Suite. Notably, Dr Q-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL.
Researcher Affiliation Collaboration Anonymous authors Paper under double-blind review
Pseudocode No The paper describes the algorithm and equations but does not provide structured pseudocode or an explicit algorithm block labeled as such.
Open Source Code Yes Dr Q-v2 s implementation is available at https://anonymous.4open.science/r/drqv2.
Open Datasets Yes We consider a set of Mu Jo Co tasks (Todorov et al., 2012) provided by DMC (Tassa et al., 2018), a widely used benchmark for continous control.
Dataset Splits No The paper describes evaluation periodicity and averaging episode returns, but it does not specify explicit train/validation/test dataset splits in the traditional sense for a static dataset. For RL environments, validation often refers to periodic performance checks on the environment, not a separate dataset split.
Hardware Specification Yes To facilitate fair wall-clock time comparison all algorithms are trained on the same hardware (i.e., a single NVIDIA V100 GPU machine) and evaluated with the same periodicity of 20000 environment steps.
Software Dependencies No The paper mentions PyTorch for image sampling ("Py Torch (i.e., grid_sample)") but does not provide specific version numbers for PyTorch or other key software dependencies required for replication.
Experiment Setup Yes Key Hyper-Parameters We conduct an extensive hyper-parameter search and identify several hyper-parameter changes compared to Dr Q. The three most important hyper-parameters are: (i) the size of the replay buffer, (ii) mini-batch size, and (iii) learning rate. Specifically, we use a 10 times larger replay buffer than Dr Q. We also use a smaller mini-batch size of 256 without any noticeable performance degradation... Finally, we find that using smaller learning rate of 1 × 10−4... A full list of hyper-parameters can be found in Appendix E.