Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Faster Deep Reinforcement Learning with Slower Online Network
Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael Littman, Alexander J. Smola
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically investigate the effectiveness of proximal updates in planning and reinforcement-learning algorithms. We begin by conducting experiments with PMPI in the context of approximate planning, and then move to large-scale RL experiments in Atari. |
| Researcher Affiliation | Collaboration | Kavosh Asadi Amazon Web Services, Rasool Fakoor Amazon Web Services, Omer Gottesman Brown University, Taesup Kim Seoul National University, Michael L. Littman Brown University, Alexander J. Smola Amazon Web Services |
| Pseudocode | Yes | The pseudo-code for DQN is presented in the Appendix. |
| Open Source Code | Yes | The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates. |
| Open Datasets | Yes | We used 55 Atari games (Bellemare et al., 2013) to conduct our experimental evaluations. For this experiment, we chose the toy 8 8 Frozen Lake environment from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper mentions following the Dopamine baseline's training and evaluation protocols and hyper-parameter settings, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not specify the hardware used (e.g., specific GPU/CPU models, memory, or cloud instances) for running the experiments. |
| Software Dependencies | No | The paper mentions using the Dopamine baseline and Open AI Gym but does not provide specific version numbers for any software components or libraries required for reproduction. |
| Experiment Setup | Yes | Our Pro agents have a single additional hyper-parameter c. We did a minimal random search on 6 games to tune c. Figure 2 visualizes the performance of Pro agents as a function of c. In light of this result, we set c = 0.2 for DQN Pro and c = 0.05 for Rainbow Pro. We used these values of c for all 55 games, and note that we performed no further hyper-parameter tuning at all. Our training and evaluation protocols and the hyper-parameter settings follow those of the Dopamine baseline (Castro et al., 2018). |