Prioritized Experience Replay
Authors: Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new stateof-the-art, outperforming DQN with uniform replay on 41 out of 49 games. |
| Researcher Affiliation | Industry | Tom Schaul, John Quan, Ioannis Antonoglou and David Silver Google Deep Mind {schaul,johnquan,ioannisa,davidsilver}@google.com |
| Pseudocode | Yes | Algorithm 1 Double DQN with proportional prioritization |
| Open Source Code | No | The paper describes the algorithms and provides implementation details but does not include any explicit statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | For this, we chose the collection of Atari benchmarks (Bellemare et al., 2012) with their end-to-end RL from vision setup, because they are popular and contain diverse sets of challenges, including delayed credit assignment, partial observability, and difficult function approximation (Mnih et al., 2015; van Hasselt et al., 2016). |
| Dataset Splits | No | The paper describes evaluation methods such as 'human starts evaluation' and 'test evaluation', but it does not specify explicit train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | Yes | The complete architecture is shown in Figure 6, and is implemented using Torch7 (Collobert et al., 2011). |
| Experiment Setup | Yes | Only a single hyperparameter adjustment was necessary compared to the baseline: Given that prioritized replay picks high-error transitions more often, the typical gradient magnitudes are larger, so we reduced the step-size η by a factor 4 compared to the (Double) DQN setup. For the α and β0 hyperparameters that are introduced by prioritization, we did a coarse grid search (evaluated on a subset of 8 games), and found the sweet spot to be α = 0.7, β0 = 0.5 for the rank-based variant and α = 0.6, β0 = 0.4 for the proportional variant. |