Learning Compositional Neural Programs with Recursive Tree Search and Planning
Authors: Thomas PIERROT, Guillaume Ligner, Scott E. Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that Alpha NPI can sort as well as previous strongly supervised NPI variants. The Alpha NPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disks. The experiments also show that when deploying our neural network policies, it is advantageous to do planning with guided Monte Carlo tree search. |
| Researcher Affiliation | Collaboration | Thomas Pierrot Insta Deep t.pierrot@instadeep.com Guillaume Ligner Insta Deep g.ligner@instadeep.com Scott Reed Deep Mind reedscot@google.com Olivier Sigaud Sorbonne Université olivier.sigaud@upmc.fr Nicolas Perrin CNRS, Sorbonne Université perrin@isir.upmc.fr Alexandre Laterre Insta Deep a.laterre@instadeep.com David Kas Insta Deep d.kas@instadeep.com Karim Beguir kb@instadeep.com Nando de Freitas Deep Mind nandodefreitas@google.com |
| Pseudocode | Yes | The search approach is depicted in Figure 3 for a Tower of Hanoi example, see also the corresponding Figure 2 of Silver et al. [2017]. A detailed description of the search process, including pseudo-code, appears in Appendix A. |
| Open Source Code | Yes | 1The code is available at https://github.com/instadeepai/Alpha NPI |
| Open Datasets | No | The paper discusses experiments on "sorting tasks" (Bubble Sort) and "Tower of Hanoi puzzle" using instances of varying lengths/disks (e.g., "lists of length 2 to 7", "problem instances with 2 disks"). However, it does not provide concrete access information (links, DOIs, formal citations) to specific publicly available datasets used for these problems. |
| Dataset Splits | No | The paper states "We validated on lists of length 7" and "After each Adam update, we perform validation on all tasks for nval episodes." and mentions using "randomly generated lists" for testing. However, it does not provide specific dataset split information (e.g., exact percentages or sample counts) for fixed training, validation, and test sets, as the data is generated during training/validation episodes. |
| Hardware Specification | No | The paper discusses training models but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and "LSTM" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | During a training iteration, the agent selects a program i to learn. It plays nep episodes (See Appendix E for specific values) using the tree search in exploration mode with a large budget of simulations. ... The agent is trained with the Adam optimizer on this data... We trained Alpha NPI to learn the sorting library of programs on lists of length 2 to 7. ... We validated on lists of length 7 and stopped when the minimum averaged validation reward, among all programs, reached curr. |