Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

Authors: Benjamin Ellis, Matthew T Jackson, Andrei Lupu, Alexander D. Goldie, Mattie Fellows, Shimon Whiteson, Jakob Foerster

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluating Adam-Rel in both on-policy and off-policy RL, we demonstrate improved performance in both Atari and Craftax.
Researcher Affiliation Academia Benjamin Ellis University of Oxford Matthew T. Jackson University of Oxford Andrei Lupu University of Oxford Alexander D. Goldie University of Oxford Mattie Fellows University of Oxford Shimon Whiteson University of Oxford Jakob N. Foerster University of Oxford
Pseudocode Yes Algorithm 1 Pseudocode for PPO with Adam, Adam Rel, and Adam-MR.
Open Source Code Yes For the Atari experiments (both DQN and PPO), we based our implementation on Clean RL [19]. This code is available here. For the Craftax experiments, we based our implementation on Pure Jax RL [20]. This code is available here.
Open Datasets Yes To do so, we first train DQN [18, 19] agents with Adam-Rel on the Atari-10 benchmark for 40M frames... extensively evaluate our method s impact on PPO [4, 19, 20], training agents on Craftax-Classic-1B [12] ... and the Atari-572 suite [13] for 40 million frames.
Dataset Splits No The paper mentions using standard benchmarks like Atari, but it does not explicitly describe the dataset splits (e.g., percentages or sample counts for training, validation, and testing) within its text.
Hardware Specification Yes Experiments were performed on an internal cluster of NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using "Clean RL" and "Pure Jax RL" as base implementations but does not specify version numbers for these or any other software libraries or frameworks.
Experiment Setup Yes We provide details of our hyperparameter settings in Appendix F, as well as detailing our experimental setup in Section 5.