Learning Compositional Neural Programs with Recursive Tree Search and Planning

Authors: Thomas PIERROT, Guillaume Ligner, Scott E. Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments show that Alpha NPI can sort as well as previous strongly supervised NPI variants. The Alpha NPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search.
Researcher Affiliation Collaboration Thomas Pierrot Insta Deep t.pierrot@instadeep.com Guillaume Ligner Insta Deep g.ligner@instadeep.com Scott Reed Deep Mind reedscot@google.com Olivier Sigaud Sorbonne Université olivier.sigaud@upmc.fr Nicolas Perrin CNRS, Sorbonne Université perrin@isir.upmc.fr Alexandre Laterre Insta Deep a.laterre@instadeep.com David Kas Insta Deep d.kas@instadeep.com Karim Beguir kb@instadeep.com Nando de Freitas Deep Mind nandodefreitas@google.com
Pseudocode Yes The search approach is depicted in Figure 3 for a Tower of Hanoi example, see also the corresponding Figure 2 of Silver et al. [2017]. A detailed description of the search process, including pseudo-code, appears in Appendix A.
Open Source Code Yes 1The code is available at https://github.com/instadeepai/Alpha NPI
Open Datasets No The paper discusses experiments on "sorting tasks" (Bubble Sort) and "Tower of Hanoi puzzle" using instances of varying lengths/disks (e.g., "lists of length 2 to 7", "problem instances with 2 disks"). However, it does not provide concrete access information (links, DOIs, formal citations) to specific publicly available datasets used for these problems.
Dataset Splits No The paper states "We validated on lists of length 7" and "After each Adam update, we perform validation on all tasks for nval episodes." and mentions using "randomly generated lists" for testing. However, it does not provide specific dataset split information (e.g., exact percentages or sample counts) for fixed training, validation, and test sets, as the data is generated during training/validation episodes.
Hardware Specification No The paper discusses training models but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments.
Software Dependencies No The paper mentions using "Adam optimizer" and "LSTM" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes During a training iteration, the agent selects a program i to learn. It plays nep episodes (See Appendix E for specific values) using the tree search in exploration mode with a large budget of simulations. ... The agent is trained with the Adam optimizer on this data... We trained Alpha NPI to learn the sorting library of programs on lists of length 2 to 7. ... We validated on lists of length 7 and stopped when the minimum averaged validation reward, among all programs, reached curr.