Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Authors: Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus10674-10681

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate all agents on six challenging control tasks (Figure 1). For brevity, on occasion, results for three tasks are shown with the remainder presented in the appendix. An image observation is represented as a stack of three consecutive 84 84 RGB renderings (Mnih et al. 2013) to infer temporal statistics, such as velocity and acceleration.
Researcher Affiliation Collaboration 1New York University 2Facebook AI Research 3Mc Gill University 4MILA
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes Code, results, and videos are anonymously available at https://sites.google.com/view/sac-ae/home.
Open Datasets Yes We evaluate all agents on six challenging control tasks (Figure 1) from DMC (Tassa et al. 2018) and compare against several state-of-the-art model-free and model-based RL algorithms for learning from pixels: D4PG (Barth-Maron et al. 2018), an off-policy actor-critic algorithm; Pla Net (Hafner et al. 2018), a model-based method that learns a dynamics model with deterministic and stochastic latent variables and employs cross-entropy planning for control; and SLAC (Lee et al. 2019), which combines a purely stochastic latent model together with an model-free soft actor-critic.
Dataset Splits No The paper describes evaluation frequency (every 10K training observations) but does not provide specific dataset splits (percentages or counts) for training, validation, and testing as one would for a static dataset.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or specific cloud instances) used for running experiments were provided in the paper.
Software Dependencies No The paper mentions a 'PyTorch implementation' but does not specify software dependencies with version numbers.
Experiment Setup No The paper states, 'For simplicity, we keep the hyper parameters fixed across all the tasks, except for action repeat (see Appendix B.3)', but does not provide specific hyperparameter values in the main text.