Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Compositional Rules via Neural Program Synthesis
Authors: Maxwell Nye, Armando Solar-Lezama, Josh Tenenbaum, Brenden M. Lake
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, Table 1: Accuracy on SCAN splits., Table 2: Accuracy on few-shot number-word learning, using a maximum timeout of 45 seconds. |
| Researcher Affiliation | Collaboration | Maxwell I. Nye MIT Armando Solar-Lezama MIT Joshua B. Tenenbaum MIT Brenden M. Lake NYU Facebook AI |
| Pseudocode | No | The paper does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Correspondence to EMAIL. Code can be found here: github.com/mtensor/rulesynthesis |
| Open Datasets | Yes | Our ο¬rst experimental domain is the paradigm introduced in [13], informally dubbed Mini SCAN. Our next experiments concern the SCAN dataset [4, 5]. |
| Dataset Splits | No | The paper describes training and test sets and support sets for various experiments, but does not explicitly mention or specify details for a 'validation' dataset split. |
| Hardware Specification | No | The paper mentions 'compute details in supplemental Section A.1' but does not provide specific hardware details in the main text. |
| Software Dependencies | No | The paper mentions 'pyprob probabilistic programming library' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In our experiments, the meta-grammar randomly sampled grammars with 3-4 primitive rules and 2-4 higher-order rules... For each grammar, we trained with a support set of 10-20 randomly sampled examples. Our synthesis methods were tested by sampling from the network for the best grammar, or until a candidate grammar was found which was consistent with all of the support examples, using a timeout of 30 sec (on one GPU; compute details in supplemental Section A.1). If no satisfying grammar is found within a set timeout of 20 seconds, we resample another 100 support examples and retry searching for a grammar. |