Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reconciling λ-Returns with Experience Replay

Authors: Brett Daley, Christopher Amato

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to characterize the performance of DQN("λ"), we conducted numerous experiments on six Atari 2600 games.
Researcher Affiliation	Academia	Brett Daley Khoury College of Computer Sciences Northeastern University Boston, MA 02115 EMAIL Christopher Amato Khoury College of Computer Sciences Northeastern University Boston, MA 02115 EMAIL
Pseudocode	Yes	We refer to this particular instantiation of our methods as DQN(λ); the pseudocode is provided in Appendix B.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of the source code for the described methodology.
Open Datasets	Yes	We used the Open AI Gym [4] to provide an interface to the Arcade Learning Environment [2], where observations consisted of the raw frame pixels.
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, and test dataset splits, as it operates in an online reinforcement learning setting rather than a fixed dataset split.
Hardware Specification	No	The paper mentions 'NVIDIA Corporation for its GPU donation' but does not specify any particular GPU model, CPU, or other hardware components used for experiments.
Software Dependencies	No	The paper mentions using 'Open AI Gym' and 'Arcade Learning Environment' and 'Adam' for training, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We matched the hyperparameters and procedures in [25], except we trained the neural networks with Adam [14]. For all experiments in this paper, agents were trained for 10 million timesteps. An agent s performance at a given time was evaluated by averaging the earned scores of its past 100 completed episodes. Each experiment was averaged over 10 random seeds with the standard error of the mean indicated. Our complete experimental setup is discussed in Appendix A.