Imagination-Augmented Agents for Deep Reinforcement Learning
Authors: Sébastien Racanière, Theophane Weber, David Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach performs better than model-free baselines in various domains including Sokoban. It achieves better performance with less data, even with imperfect models, a significant step towards delivering the promises of model-based RL. (Introduction, page 2). 4 Sokoban experiments (Section 4, page 3). 5 Learning one model for many tasks in Mini Pacman (Section 5, page 5). |
| Researcher Affiliation | Industry | Sébastien Racanière Théophane Weber David P. Reichert Lars Buesing Arthur Guez Danilo Rezende Adria Puigdomènech Badia Oriol Vinyals Nicolas Heess Yujia Li Razvan Pascanu Peter Battaglia Demis Hassabis David Silver Daan Wierstra Deep Mind (Author list, page 1). Equal contribution, corresponding authors: {sracaniere, theophane, reichert}@google.com. (Footnote, page 1). |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific repository link, an explicit statement of code release, or mention of code in supplementary materials for the methodology described. It only provides a link to videos of agents playing Sokoban. |
| Open Datasets | No | Our implementation of Sokoban procedurally generates a new level each episode (see Appendix D.4 for details, Fig. 3 for examples). (Section 4, page 3). We designed a simple, light-weight domain called Mini Pacman, which allows us to easily define multiple tasks in an environment with shared state transitions and which enables us to do rapid experimentation. (Section 5, page 5). The paper describes environments that are procedurally generated or custom designed, but does not provide concrete access (link, DOI, repository, or citation to a public source) to the specific datasets or the generators used to create them for public download. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. It mentions "training data" but without partitioning details. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions 'asynchronous training over 32 to 64 workers' without further specification. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software dependencies (e.g., library or solver names with versions). It mentions algorithms like A3C and RMSprop, but not the software implementations or their versions. |
| Experiment Setup | Yes | We report results after an initial round of hyperparameter exploration (details in Appendix A). (Section 3.3, page 3). We used an RMSProp optimizer with learning rate 7e-5 for Mini Pacman and 2.5e-5 for Sokoban. The discount factor γ was set to 0.99... The maximum number of training steps was 500 million for Mini Pacman and 1 billion for Sokoban. The value function loss coefficient was set to 0.5. The entropy cost coefficient was set to 0.01... We used a batch size of 1. Gradient clipping was applied for gradients whose L2 norm exceeded 40. (Appendix A, page 10). |