Prioritized Experience Replay

Authors: Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new stateof-the-art, outperforming DQN with uniform replay on 41 out of 49 games.
Researcher Affiliation Industry Tom Schaul, John Quan, Ioannis Antonoglou and David Silver Google Deep Mind {schaul,johnquan,ioannisa,davidsilver}@google.com
Pseudocode Yes Algorithm 1 Double DQN with proportional prioritization
Open Source Code No The paper describes the algorithms and provides implementation details but does not include any explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets Yes For this, we chose the collection of Atari benchmarks (Bellemare et al., 2012) with their end-to-end RL from vision setup, because they are popular and contain diverse sets of challenges, including delayed credit assignment, partial observability, and difficult function approximation (Mnih et al., 2015; van Hasselt et al., 2016).
Dataset Splits No The paper describes evaluation methods such as 'human starts evaluation' and 'test evaluation', but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies Yes The complete architecture is shown in Figure 6, and is implemented using Torch7 (Collobert et al., 2011).
Experiment Setup Yes Only a single hyperparameter adjustment was necessary compared to the baseline: Given that prioritized replay picks high-error transitions more often, the typical gradient magnitudes are larger, so we reduced the step-size η by a factor 4 compared to the (Double) DQN setup. For the α and β0 hyperparameters that are introduced by prioritization, we did a coarse grid search (evaluated on a subset of 8 games), and found the sweet spot to be α = 0.7, β0 = 0.5 for the rank-based variant and α = 0.6, β0 = 0.4 for the proportional variant.