reproducibilityindex.ai

Learning Compositional Neural Programs with Recursive Tree Search and Planning

Authors: Thomas PIERROT, Guillaume Ligner, Scott E. Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments show that Alpha NPI can sort as well as previous strongly supervised NPI variants. The Alpha NPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search.
Researcher Affiliation	Collaboration	Thomas Pierrot Insta Deep t.pierrot@instadeep.com Guillaume Ligner Insta Deep g.ligner@instadeep.com Scott Reed Deep Mind reedscot@google.com Olivier Sigaud Sorbonne Université olivier.sigaud@upmc.fr Nicolas Perrin CNRS, Sorbonne Université perrin@isir.upmc.fr Alexandre Laterre Insta Deep a.laterre@instadeep.com David Kas Insta Deep d.kas@instadeep.com Karim Beguir kb@instadeep.com Nando de Freitas Deep Mind nandodefreitas@google.com
Pseudocode	Yes	The search approach is depicted in Figure 3 for a Tower of Hanoi example, see also the corresponding Figure 2 of Silver et al. [2017]. A detailed description of the search process, including pseudo-code, appears in Appendix A.
Open Source Code	Yes	1The code is available at https://github.com/instadeepai/Alpha NPI
Open Datasets	No	The paper discusses experiments on "sorting tasks" (Bubble Sort) and "Tower of Hanoi puzzle" using instances of varying lengths/disks (e.g., "lists of length 2 to 7", "problem instances with 2 disks"). However, it does not provide concrete access information (links, DOIs, formal citations) to specific publicly available datasets used for these problems.
Dataset Splits	No	The paper states "We validated on lists of length 7" and "After each Adam update, we perform validation on all tasks for nval episodes." and mentions using "randomly generated lists" for testing. However, it does not provide specific dataset split information (e.g., exact percentages or sample counts) for fixed training, validation, and test sets, as the data is generated during training/validation episodes.
Hardware Specification	No	The paper discusses training models but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments.
Software Dependencies	No	The paper mentions using "Adam optimizer" and "LSTM" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	During a training iteration, the agent selects a program i to learn. It plays nep episodes (See Appendix E for speciﬁc values) using the tree search in exploration mode with a large budget of simulations. ... The agent is trained with the Adam optimizer on this data... We trained Alpha NPI to learn the sorting library of programs on lists of length 2 to 7. ... We validated on lists of length 7 and stopped when the minimum averaged validation reward, among all programs, reached curr.