Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning
Authors: Oron Anschel, Nir Baram, Nahum Shimkin
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering, Haifa 32000, Israel. Correspondence to: Oron Anschel <oronanschel@campus.technion.ac.il>, Nir Baram <nirb@campus.technion.ac.il>, Nahum Shimkin <shimkin@ee.technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 DQN; Algorithm 2 Averaged DQN; Algorithm 3 Ensemble DQN |
| Open Source Code | Yes | DQN code was taken from Mc Gill University RLLAB, and is available online1 (together with Averaged-DQN implementation). 1Mc Gill University RLLAB DQN Atari code: https: //bitbucket.org/rllabmcgill/atari_release. Averaged-DQN code https://bitbucket.org/ oronanschel/atari_release_averaged_dqn |
| Open Datasets | Yes | We further present experiments on the Arcade Learning Environment benchmark that demonstrate significantly improved stability and performance due to the proposed extension. To evaluate Averaged-DQN, we adopt the typical RL methodology where agent performance is measured at the end of training. We have evaluated the Averaged-DQN algorithm on three Atari games from the Arcade Learning Environment (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes training procedures and evaluation metrics (e.g., "Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted"), but it does not specify explicit train/validation/test dataset splits with percentages or counts, as is common in supervised learning contexts. The RL environment is used for continuous interaction and evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the ADAM optimizer (Kingma & Ba, 2014) but does not provide specific version numbers for it or any other software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | The hyperparameters used were taken from Mnih et al. (2015). Every 1M frames, a performance test using ε-greedy policy with ε = 0.05 for 500000 frames was conducted. For minimization of the DQN loss, the ADAM optimizer (Kingma & Ba, 2014) was used on 100 mini-batches of 32 samples per target network parameters update in the first experiment, and 300 mini-batches in the second. The network architecture that was used composed of a small fully connected neural network with one hidden layer of 80 neurons. |