reproducibilityindex.ai

Imagination-Augmented Agents for Deep Reinforcement Learning

Authors: Sébastien Racanière, Theophane Weber, David Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our approach performs better than model-free baselines in various domains including Sokoban. It achieves better performance with less data, even with imperfect models, a signiﬁcant step towards delivering the promises of model-based RL. (Introduction, page 2). 4 Sokoban experiments (Section 4, page 3). 5 Learning one model for many tasks in Mini Pacman (Section 5, page 5).
Researcher Affiliation	Industry	Sébastien Racanière Théophane Weber David P. Reichert Lars Buesing Arthur Guez Danilo Rezende Adria Puigdomènech Badia Oriol Vinyals Nicolas Heess Yujia Li Razvan Pascanu Peter Battaglia Demis Hassabis David Silver Daan Wierstra Deep Mind (Author list, page 1). Equal contribution, corresponding authors: {sracaniere, theophane, reichert}@google.com. (Footnote, page 1).
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a specific repository link, an explicit statement of code release, or mention of code in supplementary materials for the methodology described. It only provides a link to videos of agents playing Sokoban.
Open Datasets	No	Our implementation of Sokoban procedurally generates a new level each episode (see Appendix D.4 for details, Fig. 3 for examples). (Section 4, page 3). We designed a simple, light-weight domain called Mini Pacman, which allows us to easily deﬁne multiple tasks in an environment with shared state transitions and which enables us to do rapid experimentation. (Section 5, page 5). The paper describes environments that are procedurally generated or custom designed, but does not provide concrete access (link, DOI, repository, or citation to a public source) to the specific datasets or the generators used to create them for public download.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. It mentions "training data" but without partitioning details.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions 'asynchronous training over 32 to 64 workers' without further specification.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software dependencies (e.g., library or solver names with versions). It mentions algorithms like A3C and RMSprop, but not the software implementations or their versions.
Experiment Setup	Yes	We report results after an initial round of hyperparameter exploration (details in Appendix A). (Section 3.3, page 3). We used an RMSProp optimizer with learning rate 7e-5 for Mini Pacman and 2.5e-5 for Sokoban. The discount factor γ was set to 0.99... The maximum number of training steps was 500 million for Mini Pacman and 1 billion for Sokoban. The value function loss coefﬁcient was set to 0.5. The entropy cost coefﬁcient was set to 0.01... We used a batch size of 1. Gradient clipping was applied for gradients whose L2 norm exceeded 40. (Appendix A, page 10).