reproducibilityindex.ai

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Authors: Oron Anschel, Nir Baram, Nahum Shimkin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Haifa 32000, Israel. Correspondence to: Oron Anschel <oronanschel@campus.technion.ac.il>, Nir Baram <nirb@campus.technion.ac.il>, Nahum Shimkin <shimkin@ee.technion.ac.il>.
Pseudocode	Yes	Algorithm 1 DQN; Algorithm 2 Averaged DQN; Algorithm 3 Ensemble DQN
Open Source Code	Yes	DQN code was taken from Mc Gill University RLLAB, and is available online1 (together with Averaged-DQN implementation). 1Mc Gill University RLLAB DQN Atari code: https: //bitbucket.org/rllabmcgill/atari_release. Averaged-DQN code https://bitbucket.org/ oronanschel/atari_release_averaged_dqn
Open Datasets	Yes	We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension. To evaluate Averaged-DQN, we adopt the typical RL methodology where agent performance is measured at the end of training. We have evaluated the Averaged-DQN algorithm on three Atari games from the Arcade Learning Environment (Bellemare et al., 2013).
Dataset Splits	No	The paper describes training procedures and evaluation metrics (e.g., "Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is common in supervised learning contexts. The RL environment is used for continuous interaction and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions the ADAM optimizer (Kingma & Ba, 2014) but does not provide specific version numbers for it or any other software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The hyperparameters used were taken from Mnih et al. (2015). Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted. For minimization of the DQN loss, the ADAM optimizer (Kingma & Ba, 2014) was used on 100 mini-batches of 32 samples per target network parameters update in the first experiment, and 300 mini-batches in the second. The network architecture that was used composed of a small fully connected neural network with one hidden layer of 80 neurons.