Making Neural Programming Architectures Generalize via Recursion
Authors: Jonathon Cai, Richard Shin, Dawn Song
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, neural networks that attempt to learn programs from data have exhibited poor generalizability. [...] As an application, we implement recursion in the Neural Programmer-Interpreter framework on four tasks: grade-school addition, bubble sort, topological sort, and quicksort. We demonstrate superior generalizability and interpretability with small amounts of training data. |
| Researcher Affiliation | Academia | Jonathon Cai, Richard Shin, Dawn Song Department of Computer Science University of California, Berkeley Berkeley, CA 94720, USA {jonathon,ricshin,dawnsong}@cs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Neural programming inference |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | No | The paper describes generating its own training data (e.g., 'The training set for addition contains 200 traces.') and does not provide concrete access information or citations to publicly available datasets used for training. |
| Dataset Splits | No | The paper mentions training and testing, but does not explicitly provide details about training/validation/test splits, specific proportions, or how data was partitioned for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like Keras and Adam optimizer, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The training set for addition contains 200 traces. The maximum problem length in this training set is 3. [...] We train using the Adam optimizer and use a 2-layer LSTM and task-specific state encoders for the external environments, as described in Reed & de Freitas (2016). In all experiments, α is set to 0.5. |