reproducibilityindex.ai

Continuous Deep Q-Learning with Model-based Acceleration

Authors: Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our paper provides three main contributions: ﬁrst, we derive and evaluate a Q-function representation that allows for effective Q-learning in continuous domains. Second, we evaluate several na ıve options for incorporating learned models into model-free Q-learning, and we show that they are minimally effective on our continuous control tasks. Third, we propose to combine locally linear models with local on-policy imagination rollouts to accelerate modelfree continuous Q-learning, and show that this produces a large improvement in sample complexity. We evaluate our method on a series of simulated robotic tasks and compare to prior methods.
Researcher Affiliation	Collaboration	1University of Cambridge 2Max Planck Institute for Intelligent Systems 3Google Brain 4Google Deep Mind
Pseudocode	Yes	Algorithm 1 Continuous Q-Learning with NAF and Algorithm 2 Imagination Rollouts with Fitted Dynamics and Optional i LQG Exploration
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code availability for the described methodology.
Open Datasets	No	The paper mentions using 'simulated robotic tasks using the Mu Jo Co simulator (Todorov et al., 2012)' and 'benchmarks described by Lillicrap et al. (2016)', but does not provide concrete access information (e.g., a specific link, DOI, or formal citation for a publicly available dataset used for training).
Dataset Splits	No	The paper mentions using a 'replay buffer' and 'simulated robotic tasks', but does not specify exact train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification	No	The paper mentions running experiments on 'simulated robotic tasks' but does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for the experiments.
Software Dependencies	No	The paper mentions using the 'Mu Jo Co simulator' and 'ADAM' optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For both our method and the prior DDPG (Lillicrap et al., 2016) algorithm in the comparisons, we used neural networks with two layers of 200 rectiﬁed linear units (Re LU)... Since Q-learning was done with a replay buffer, we applied the Q-learning update 5 times per each step of experience to accelerate learning (I = 5)... We found the most sensitive hyperparameters to be presence or absence of batch normalization, base learning rate for ADAM (Kingma & Ba, 2014) {1e 4, 1e 3, 1e 2}, and exploration noise scale {0.1, 0.3, 1.0}.