Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Munchausen Reinforcement Learning
Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood implicit Kullback-Leibler regularization and increase of the action-gap. |
| Researcher Affiliation | Collaboration | Nino Vieillard Google Research, Brain Team Université de Lorraine, CNRS, Inria, IECL F-54000 Nancy, France EMAIL Olivier Pietquin Google Research, Brain Team EMAIL Matthieu Geist Google Research, Brain Team EMAIL |
| Pseudocode | No | All details of the resulting algorithm are provide in Appx. B.1. (Appendix B.1 describes the algorithm using text and mathematical equations, but it does not present it in a formal pseudocode block or algorithm box.) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | On the Arcade Learning Environment (ALE) [6] |
| Dataset Splits | No | The paper states that training runs for "200M frames" and evaluations are done on "the full set of 60 Atari games of ALE". However, it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, or test sets) in the typical supervised learning manner, as RL experiments typically involve continuous interaction with an environment. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Dopamine [10]" and "Tensor Flow [1]" but does not provide specific version numbers for these software dependencies or any other libraries. |
| Experiment Setup | Yes | After some tuning on a few Atari games, we found a working zone for these parameters to be α = 0.9, τ = 0.03 and l0 = 1, used for all experiments, in M-DQN and M-IQN. |