reproducibilityindex.ai

Munchausen Reinforcement Learning

Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modiﬁcations to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood implicit Kullback-Leibler regularization and increase of the action-gap.
Researcher Affiliation	Collaboration	Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria, IECL F-54000 Nancy, France vieillard@google.com Olivier Pietquin Google Research, Brain Team pietquin@google.com Matthieu Geist Google Research, Brain Team mfgeist@google.com
Pseudocode	No	All details of the resulting algorithm are provide in Appx. B.1. (Appendix B.1 describes the algorithm using text and mathematical equations, but it does not present it in a formal pseudocode block or algorithm box.)
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository.
Open Datasets	Yes	On the Arcade Learning Environment (ALE) [6]
Dataset Splits	No	The paper states that training runs for "200M frames" and evaluations are done on "the full set of 60 Atari games of ALE". However, it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, or test sets) in the typical supervised learning manner, as RL experiments typically involve continuous interaction with an environment.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using "Dopamine [10]" and "Tensor Flow [1]" but does not provide specific version numbers for these software dependencies or any other libraries.
Experiment Setup	Yes	After some tuning on a few Atari games, we found a working zone for these parameters to be α = 0.9, τ = 0.03 and l0 = 1, used for all experiments, in M-DQN and M-IQN.