reproducibilityindex.ai

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Authors: Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus10674-10681

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate all agents on six challenging control tasks (Figure 1). For brevity, on occasion, results for three tasks are shown with the remainder presented in the appendix. An image observation is represented as a stack of three consecutive 84 84 RGB renderings (Mnih et al. 2013) to infer temporal statistics, such as velocity and acceleration.
Researcher Affiliation	Collaboration	1New York University 2Facebook AI Research 3Mc Gill University 4MILA
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	Code, results, and videos are anonymously available at https://sites.google.com/view/sac-ae/home.
Open Datasets	Yes	We evaluate all agents on six challenging control tasks (Figure 1) from DMC (Tassa et al. 2018) and compare against several state-of-the-art model-free and model-based RL algorithms for learning from pixels: D4PG (Barth-Maron et al. 2018), an off-policy actor-critic algorithm; Pla Net (Hafner et al. 2018), a model-based method that learns a dynamics model with deterministic and stochastic latent variables and employs cross-entropy planning for control; and SLAC (Lee et al. 2019), which combines a purely stochastic latent model together with an model-free soft actor-critic.
Dataset Splits	No	The paper describes evaluation frequency (every 10K training observations) but does not provide specific dataset splits (percentages or counts) for training, validation, and testing as one would for a static dataset.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or specific cloud instances) used for running experiments were provided in the paper.
Software Dependencies	No	The paper mentions a 'PyTorch implementation' but does not specify software dependencies with version numbers.
Experiment Setup	No	The paper states, 'For simplicity, we keep the hyper parameters ﬁxed across all the tasks, except for action repeat (see Appendix B.3)', but does not provide specific hyperparameter values in the main text.