reproducibilityindex.ai

When to use parametric models in reinforcement learning?

Authors: Hado P. van Hasselt, Matteo Hessel, John Aslanides

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efﬁciency, improving over prior results with parametric models.
Researcher Affiliation	Industry	Hado van Hasselt Deep Mind London, UK hado@google.com Matteo Hessel Deep Mind London, UK mtthss@google.com John Aslanides Deep Mind London, UK jaslanides@google.com
Pseudocode	Yes	Algorithm 1 Model-based reinforcement learning
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	Sim PLe Kaiser et al. [2019] showed data-efﬁcient learning is possible in Atari 2600 videos games from the arcade learning environment [Bellemare et al., 2013] with a purely model-based approach: only updating the policy with data sampled from a learnt parametric model m.
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts) for the Atari games. It describes the total number of interactions and episodes but not a formal data partitioning for validation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. It generally refers to computational requirements but not specific specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their versions) that would be needed for reproducibility.
Experiment Setup	Yes	In our experiments below, we trained Rainbow DQN for a total number of real interactions comparable to that of Sim PLe, by setting K = 100,000, M = 1 and P = 32. The other changes we made to make Rainbow DQN more data efﬁcient were to increase the number of steps in the multi-step returns from 3 to 20, and to reduce the number of steps before we start sampling from replay from 20, 000 to 1600.