Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neurally-Guided Structure Inference
Authors: Sidi Lu, Jiayuan Mao, Joshua Tenenbaum, Jiajun Wu
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on two representative structure inference tasks: probabilistic matrix decomposition and symbolic program parsing. It outperforms data-driven and search-based alternatives on both tasks. ... To evaluate the accuracy of our approach, we replicate the experiments in Grosse et al. (2012), including one synthetically generated dataset and two real-world datasets: motion capture and image patches. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, 2MIT CSAIL, 3IIIS, Tsinghua University, 4Department of Brain and Cognitive Sciences, MIT, 5Center for Brains, Minds and Machines (CBMM), MIT. |
| Pseudocode | Yes | Algorithm 1 Neurally-Guided Structure Inference Function Infer(D, Type): rule Select Rule(D, Type) for each non-terminal symbol s in rule do Cs Decompose Data(D, rule, s) Replace s in rule with Infer(Cs, s) return rule |
| Open Source Code | Yes | Project Page: http://ngsi.csail.mit.edu. |
| Open Datasets | Yes | Looking at hand-written digits from the MNIST dataset (Le Cun et al., 1998)... The dataset of human motion capture (Hsu et al., 2005; Taylor et al., 2007)... The natural image patches dataset contains samples from the Sparsenet dataset proposed in Olshausen & Field (1996). |
| Dataset Splits | No | The paper mentions generating synthetic data for training and testing generalizability on programs of different lengths/depths, but it does not specify explicit training/validation/test split percentages or sample counts. |
| Hardware Specification | Yes | We ran all experiments on a machine with an Intel Xeon E5645 CPU and a GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and CNN-GRU models but does not specify version numbers for any software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We train the model with the Adam optimizer (Kingma & Ba, 2015). The hyperparameters for the optimizer are set to be β1 = 0.9, β2 = 0.9, α = 10 4. The model is trained for 100,000 iterations, with a batch size of 100. ... We adopt a unidirectional GRU with a hidden dimension of 256 as the code string encoder for production rule selection. We train the model using the Adam optimizer, with hyperparameters β1 = 0.9, β2 = 0.9, α = 10 4. The batch size is set to 64. |