Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Dr Q-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. Dr Q-v2 builds on Dr Q, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the Deep Mind Control Suite. Notably, Dr Q-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL.
Researcher Affiliation Collaboration Anonymous authors Paper under double-blind review
Pseudocode No The paper describes the algorithm and equations but does not provide structured pseudocode or an explicit algorithm block labeled as such.
Open Source Code Yes Dr Q-v2 s implementation is available at https://anonymous.4open.science/r/drqv2.
Open Datasets Yes We consider a set of Mu Jo Co tasks (Todorov et al., 2012) provided by DMC (Tassa et al., 2018), a widely used benchmark for continous control.
Dataset Splits No The paper describes evaluation periodicity and averaging episode returns, but it does not specify explicit train/validation/test dataset splits in the traditional sense for a static dataset. For RL environments, validation often refers to periodic performance checks on the environment, not a separate dataset split.
Hardware Specification Yes To facilitate fair wall-clock time comparison all algorithms are trained on the same hardware (i.e., a single NVIDIA V100 GPU machine) and evaluated with the same periodicity of 20000 environment steps.
Software Dependencies No The paper mentions PyTorch for image sampling ("Py Torch (i.e., grid_sample)") but does not provide specific version numbers for PyTorch or other key software dependencies required for replication.
Experiment Setup Yes Key Hyper-Parameters We conduct an extensive hyper-parameter search and identify several hyper-parameter changes compared to Dr Q. The three most important hyper-parameters are: (i) the size of the replay buffer, (ii) mini-batch size, and (iii) learning rate. Specifically, we use a 10 times larger replay buffer than Dr Q. We also use a smaller mini-batch size of 256 without any noticeable performance degradation... Finally, we find that using smaller learning rate of 1 × 10−4... A full list of hyper-parameters can be found in Appendix E.