Faster Deep Reinforcement Learning with Slower Online Network
Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael Littman, Alexander J. Smola
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically investigate the effectiveness of proximal updates in planning and reinforcement-learning algorithms. We begin by conducting experiments with PMPI in the context of approximate planning, and then move to large-scale RL experiments in Atari. |
| Researcher Affiliation | Collaboration | Kavosh Asadi Amazon Web Services, Rasool Fakoor Amazon Web Services, Omer Gottesman Brown University, Taesup Kim Seoul National University, Michael L. Littman Brown University, Alexander J. Smola Amazon Web Services |
| Pseudocode | Yes | The pseudo-code for DQN is presented in the Appendix. |
| Open Source Code | Yes | The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates. |
| Open Datasets | Yes | We used 55 Atari games (Bellemare et al., 2013) to conduct our experimental evaluations. For this experiment, we chose the toy 8 8 Frozen Lake environment from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper mentions following the Dopamine baseline's training and evaluation protocols and hyper-parameter settings, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not specify the hardware used (e.g., specific GPU/CPU models, memory, or cloud instances) for running the experiments. |
| Software Dependencies | No | The paper mentions using the Dopamine baseline and Open AI Gym but does not provide specific version numbers for any software components or libraries required for reproduction. |
| Experiment Setup | Yes | Our Pro agents have a single additional hyper-parameter c. We did a minimal random search on 6 games to tune c. Figure 2 visualizes the performance of Pro agents as a function of c. In light of this result, we set c = 0.2 for DQN Pro and c = 0.05 for Rainbow Pro. We used these values of c for all 55 games, and note that we performed no further hyper-parameter tuning at all. Our training and evaluation protocols and the hyper-parameter settings follow those of the Dopamine baseline (Castro et al., 2018). |