Faster Deep Reinforcement Learning with Slower Online Network

Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael Littman, Alexander J. Smola

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically investigate the effectiveness of proximal updates in planning and reinforcement-learning algorithms. We begin by conducting experiments with PMPI in the context of approximate planning, and then move to large-scale RL experiments in Atari.
Researcher Affiliation Collaboration Kavosh Asadi Amazon Web Services, Rasool Fakoor Amazon Web Services, Omer Gottesman Brown University, Taesup Kim Seoul National University, Michael L. Littman Brown University, Alexander J. Smola Amazon Web Services
Pseudocode Yes The pseudo-code for DQN is presented in the Appendix.
Open Source Code Yes The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates.
Open Datasets Yes We used 55 Atari games (Bellemare et al., 2013) to conduct our experimental evaluations. For this experiment, we chose the toy 8 8 Frozen Lake environment from Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper mentions following the Dopamine baseline's training and evaluation protocols and hyper-parameter settings, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not specify the hardware used (e.g., specific GPU/CPU models, memory, or cloud instances) for running the experiments.
Software Dependencies No The paper mentions using the Dopamine baseline and Open AI Gym but does not provide specific version numbers for any software components or libraries required for reproduction.
Experiment Setup Yes Our Pro agents have a single additional hyper-parameter c. We did a minimal random search on 6 games to tune c. Figure 2 visualizes the performance of Pro agents as a function of c. In light of this result, we set c = 0.2 for DQN Pro and c = 0.05 for Rainbow Pro. We used these values of c for all 55 games, and note that we performed no further hyper-parameter tuning at all. Our training and evaluation protocols and the hyper-parameter settings follow those of the Dopamine baseline (Castro et al., 2018).