Munchausen Reinforcement Learning
Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood implicit Kullback-Leibler regularization and increase of the action-gap. |
| Researcher Affiliation | Collaboration | Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria, IECL F-54000 Nancy, France vieillard@google.com Olivier Pietquin Google Research, Brain Team pietquin@google.com Matthieu Geist Google Research, Brain Team mfgeist@google.com |
| Pseudocode | No | All details of the resulting algorithm are provide in Appx. B.1. (Appendix B.1 describes the algorithm using text and mathematical equations, but it does not present it in a formal pseudocode block or algorithm box.) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | On the Arcade Learning Environment (ALE) [6] |
| Dataset Splits | No | The paper states that training runs for "200M frames" and evaluations are done on "the full set of 60 Atari games of ALE". However, it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, or test sets) in the typical supervised learning manner, as RL experiments typically involve continuous interaction with an environment. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Dopamine [10]" and "Tensor Flow [1]" but does not provide specific version numbers for these software dependencies or any other libraries. |
| Experiment Setup | Yes | After some tuning on a few Atari games, we found a working zone for these parameters to be α = 0.9, τ = 0.03 and l0 = 1, used for all experiments, in M-DQN and M-IQN. |