Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Authors: Oron Anschel, Nir Baram, Nahum Shimkin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.
Researcher Affiliation Academia 1Department of Electrical Engineering, Haifa 32000, Israel. Correspondence to: Oron Anschel <oronanschel@campus.technion.ac.il>, Nir Baram <nirb@campus.technion.ac.il>, Nahum Shimkin <shimkin@ee.technion.ac.il>.
Pseudocode Yes Algorithm 1 DQN; Algorithm 2 Averaged DQN; Algorithm 3 Ensemble DQN
Open Source Code Yes DQN code was taken from Mc Gill University RLLAB, and is available online1 (together with Averaged-DQN implementation). 1Mc Gill University RLLAB DQN Atari code: https: //bitbucket.org/rllabmcgill/atari_release. Averaged-DQN code https://bitbucket.org/ oronanschel/atari_release_averaged_dqn
Open Datasets Yes We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension. To evaluate Averaged-DQN, we adopt the typical RL methodology where agent performance is measured at the end of training. We have evaluated the Averaged-DQN algorithm on three Atari games from the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits No The paper describes training procedures and evaluation metrics (e.g., "Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is common in supervised learning contexts. The RL environment is used for continuous interaction and evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions the ADAM optimizer (Kingma & Ba, 2014) but does not provide specific version numbers for it or any other software libraries, frameworks, or programming languages used.
Experiment Setup Yes The hyperparameters used were taken from Mnih et al. (2015). Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted. For minimization of the DQN loss, the ADAM optimizer (Kingma & Ba, 2014) was used on 100 mini-batches of 32 samples per target network parameters update in the first experiment, and 300 mini-batches in the second. The network architecture that was used composed of a small fully connected neural network with one hidden layer of 80 neurons.