Revisiting Fundamentals of Experience Replay

Authors: William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the replay ratio. Our additive and ablative studies upend conventional wisdom around experience replay—greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected.
Researcher Affiliation Collaboration 1Google Brain 2MILA, Universit e de Montr eal 3CIFAR Director 4CIFAR Fellow 5Deep Mind.
Pseudocode No The paper describes algorithmic components and experimental procedures but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions that an "open-source implementation based on this agent is available in Dopamine (Castro et al., 2018)", but this refers to a framework they used, not code for the specific methodology or contributions presented in this paper.
Open Datasets Yes We conduct experiments on the commonly-used Atari Arcade Learning Environment (Bellemare et al., 2013) with sticky actions (Machado et al., 2018).
Dataset Splits No The paper does not explicitly provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning, common in supervised learning contexts. Experiments are conducted on the Atari Arcade Learning Environment where agents interact with the environment.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'Dopamine' as an implementation framework, but it does not specify a version number for Dopamine or any other software dependencies with their respective versions.
Experiment Setup Yes Rainbow uses a replay capacity of 1M and an oldest policy of 250k, corresponding to a replay ratio of 0.25. We assess the cross product of 5 settings of the replay capacity (from 0.1M to 10M) and 4 settings of the oldest policy (from 25k to 25M)... In these experiments, we fix the total number of gradient updates and the batch size per gradient update to the settings used by Rainbow...