Revisiting Fundamentals of Experience Replay
Authors: William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the replay ratio. Our additive and ablative studies upend conventional wisdom around experience replay—greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. |
| Researcher Affiliation | Collaboration | 1Google Brain 2MILA, Universit e de Montr eal 3CIFAR Director 4CIFAR Fellow 5Deep Mind. |
| Pseudocode | No | The paper describes algorithmic components and experimental procedures but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that an "open-source implementation based on this agent is available in Dopamine (Castro et al., 2018)", but this refers to a framework they used, not code for the specific methodology or contributions presented in this paper. |
| Open Datasets | Yes | We conduct experiments on the commonly-used Atari Arcade Learning Environment (Bellemare et al., 2013) with sticky actions (Machado et al., 2018). |
| Dataset Splits | No | The paper does not explicitly provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning, common in supervised learning contexts. Experiments are conducted on the Atari Arcade Learning Environment where agents interact with the environment. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Dopamine' as an implementation framework, but it does not specify a version number for Dopamine or any other software dependencies with their respective versions. |
| Experiment Setup | Yes | Rainbow uses a replay capacity of 1M and an oldest policy of 250k, corresponding to a replay ratio of 0.25. We assess the cross product of 5 settings of the replay capacity (from 0.1M to 10M) and 4 settings of the oldest policy (from 25k to 25M)... In these experiments, we fix the total number of gradient updates and the batch size per gradient update to the settings used by Rainbow... |