When to use parametric models in reinforcement learning?
Authors: Hado P. van Hasselt, Matteo Hessel, John Aslanides
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models. |
| Researcher Affiliation | Industry | Hado van Hasselt Deep Mind London, UK hado@google.com Matteo Hessel Deep Mind London, UK mtthss@google.com John Aslanides Deep Mind London, UK jaslanides@google.com |
| Pseudocode | Yes | Algorithm 1 Model-based reinforcement learning |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | Yes | Sim PLe Kaiser et al. [2019] showed data-efficient learning is possible in Atari 2600 videos games from the arcade learning environment [Bellemare et al., 2013] with a purely model-based approach: only updating the policy with data sampled from a learnt parametric model m. |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts) for the Atari games. It describes the total number of interactions and episodes but not a formal data partitioning for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. It generally refers to computational requirements but not specific specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | In our experiments below, we trained Rainbow DQN for a total number of real interactions comparable to that of Sim PLe, by setting K = 100,000, M = 1 and P = 32. The other changes we made to make Rainbow DQN more data efficient were to increase the number of steps in the multi-step returns from 3 to 20, and to reduce the number of steps before we start sampling from replay from 20, 000 to 1600. |