reproducibilityindex.ai

Shallow Updates for Deep Reinforcement Learning

Authors: Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested LS-DQN on ﬁve Atari games and demonstrate signiﬁcant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method.
Researcher Affiliation	Academia	Nir Levine Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 levin.nir1@gmail.com Tom Zahavy Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 tomzahavy@campus.technion.ac.il Daniel J. Mankowitz Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 danielm@tx.technion.ac.il Aviv Tamar Dept. of Electrical Engineering and Computer Sciences, UC Berkeley Berkeley, CA 94720 avivt@berkeley.edu Shie Mannor Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 shie@ee.technion.ac.il
Pseudocode	Yes	Algorithm 1 LS-DQN Algorithm
Open Source Code	Yes	Code is available online at https://github.com/Shallow-Updates-for-Deep-RL
Open Datasets	Yes	We trained DQN agents on two games from the Arcade Learning Environment (ALE, Bellemare et al.); namely, Breakout and Qbert, using the vanilla DQN implementation (Mnih et al., 2015).
Dataset Splits	No	The paper mentions 'periodic evaluation' and '20 roll-outs' but does not specify explicit train/validation/test dataset splits with percentages, sample counts, or predefined citations.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper does not list specific version numbers for software dependencies (e.g., Python, machine learning frameworks like TensorFlow or PyTorch).
Experiment Setup	Yes	We chose to run a LS-update every NDRL = 500k steps, for a total of 50M steps (SRLiters = 100). We used the current ER buffer as the generated data in the LS-UPDATE function (line 7 in Alg. 1, NSRL = 1M), and a regularization coefﬁcient λ = 1 for the Bayesian prior solution (both for FQI and LSTQ-Q)... For both ADAM and FQI, we ﬁrst collected 80k data samples from the ER at each epoch. For ADAM, we performed 20 iterations over the data, where each iteration consisted of randomly permuting the data, dividing it into mini-batches and optimizing using ADAM over the mini-batches... We compared these approaches for different mini-batch sizes of 32, 512, and 4096 data points, and used λ = 1 for all experiments.