Shallow Updates for Deep Reinforcement Learning

Authors: Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method.
Researcher Affiliation Academia Nir Levine Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 levin.nir1@gmail.com Tom Zahavy Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 tomzahavy@campus.technion.ac.il Daniel J. Mankowitz Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 danielm@tx.technion.ac.il Aviv Tamar Dept. of Electrical Engineering and Computer Sciences, UC Berkeley Berkeley, CA 94720 avivt@berkeley.edu Shie Mannor Dept. of Electrical Engineering The Technion Israel Institute of Technology Israel, Haifa 3200003 shie@ee.technion.ac.il
Pseudocode Yes Algorithm 1 LS-DQN Algorithm
Open Source Code Yes Code is available online at https://github.com/Shallow-Updates-for-Deep-RL
Open Datasets Yes We trained DQN agents on two games from the Arcade Learning Environment (ALE, Bellemare et al.); namely, Breakout and Qbert, using the vanilla DQN implementation (Mnih et al., 2015).
Dataset Splits No The paper mentions 'periodic evaluation' and '20 roll-outs' but does not specify explicit train/validation/test dataset splits with percentages, sample counts, or predefined citations.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper does not list specific version numbers for software dependencies (e.g., Python, machine learning frameworks like TensorFlow or PyTorch).
Experiment Setup Yes We chose to run a LS-update every NDRL = 500k steps, for a total of 50M steps (SRLiters = 100). We used the current ER buffer as the generated data in the LS-UPDATE function (line 7 in Alg. 1, NSRL = 1M), and a regularization coefficient λ = 1 for the Bayesian prior solution (both for FQI and LSTQ-Q)... For both ADAM and FQI, we first collected 80k data samples from the ER at each epoch. For ADAM, we performed 20 iterations over the data, where each iteration consisted of randomly permuting the data, dividing it into mini-batches and optimizing using ADAM over the mini-batches... We compared these approaches for different mini-batch sizes of 32, 512, and 4096 data points, and used λ = 1 for all experiments.