reproducibilityindex.ai

Prioritized Experience Replay

Authors: Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new stateof-the-art, outperforming DQN with uniform replay on 41 out of 49 games.
Researcher Affiliation	Industry	Tom Schaul, John Quan, Ioannis Antonoglou and David Silver Google Deep Mind {schaul,johnquan,ioannisa,davidsilver}@google.com
Pseudocode	Yes	Algorithm 1 Double DQN with proportional prioritization
Open Source Code	No	The paper describes the algorithms and provides implementation details but does not include any explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets	Yes	For this, we chose the collection of Atari benchmarks (Bellemare et al., 2012) with their end-to-end RL from vision setup, because they are popular and contain diverse sets of challenges, including delayed credit assignment, partial observability, and difﬁcult function approximation (Mnih et al., 2015; van Hasselt et al., 2016).
Dataset Splits	No	The paper describes evaluation methods such as 'human starts evaluation' and 'test evaluation', but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	Yes	The complete architecture is shown in Figure 6, and is implemented using Torch7 (Collobert et al., 2011).
Experiment Setup	Yes	Only a single hyperparameter adjustment was necessary compared to the baseline: Given that prioritized replay picks high-error transitions more often, the typical gradient magnitudes are larger, so we reduced the step-size η by a factor 4 compared to the (Double) DQN setup. For the α and β0 hyperparameters that are introduced by prioritization, we did a coarse grid search (evaluated on a subset of 8 games), and found the sweet spot to be α = 0.7, β0 = 0.5 for the rank-based variant and α = 0.6, β0 = 0.4 for the proportional variant.