Resetting the Optimizer in Deep RL: An Empirical Study
Authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark. We chose the standard Atari benchmark [10] to perform our study, and also chose the popular Rainbow agent [5], which fruitfully combined a couple of important techniques in learning the value function. Here our desire is to investigate the effect of resetting the optimizer on the behavior of the Rainbow agent with its default Adam optimizer. For all of our ablation studies, including this experiment, we worked with the following 12 games: Amidar, Asterix, Beam Rider, Breakout, Crazy Climber, Demon Attack, Gopher, Hero, Kangaroo, Phoenix , Seaquest, and Zaxxon. Note that we will present comprehensive results on the full set of 55 Atari games later. Limiting our experiments to these 12 games allowed us to run multiple seeds per agent-environment pair, and therefore, obtain statistically significant results. We also conducted experiments in continuous-action environments using Mu Jo Co physics simulator [22]. |
| Researcher Affiliation | Collaboration | Kavosh Asadi Amazon Rasool Fakoor Amazon Shoham Sabach Amazon & Technion |
| Pseudocode | Yes | Algorithm 1 Pseudocode for DQN with (resetting) Adam |
| Open Source Code | No | The paper mentions using and checking implementations of Rainbow and DQN in the Dopamine framework and on GitHub, but it does not state that the authors themselves are providing open-source code for the specific methodology or modifications described in this paper. It refers to third-party or existing implementations. |
| Open Datasets | Yes | We chose the standard Atari benchmark [10] to perform our study, and also chose the popular Rainbow agent [5]. |
| Dataset Splits | No | The paper mentions using the 'standard Atari benchmark' and performing experiments on a subset of 12 games and then the full 55 games, with results averaged over 10 random seeds. However, it does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility, beyond implying standard benchmark usage. |
| Hardware Specification | No | The paper mentions running experiments on the 'Atari benchmark' and with the 'Mu Jo Co physics simulator', but it does not specify any hardware details such as GPU models, CPU types, or cloud computing instance specifications used for these experiments. |
| Software Dependencies | No | The paper mentions using the 'Dopamine framework [1]' and specific optimizers like 'Adam', 'RMSProp', and 'Rectified Adam'. However, it does not provide specific version numbers for these software libraries, programming languages (e.g., Python), or any other ancillary software dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | Table 2: Hyper-parameters used in our experiments. Rainbow-Adam hyper-parameters (shared) Replay buffer size 200000 Target update period variable (default 8000) Max steps per episode 27000 Batch size 64 Update period 4 Number of frame skip 4 ϵ-greedy (training time) 0.01 ϵ-greedy (evaluation time) 0.001 ϵ-greedy decay period 250000 Burn-in period / Min replay size 20000 Discount factor (γ) 0.99 Adam learning rate 6.25 10 5 Adam ϵ 0.00015 Adam β1 0.9 Adam β2 0.999 |