Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Grammar-Based Grounded Lexicon Learning
Authors: Jiayuan Mao, Freda Shi, Jiajun Wu, Roger Levy, Josh Tenenbaum
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words. and We evaluate G2L2 on two domains: visual reasoning in CLEVR [21] and language-driven navigation in SCAN [25]. Beyond the grounding accuracy, we also evaluate the compositional generalizability and data efficiency, comparing G2L2 with end-to-end neural models and modular neural networks. |
| Researcher Affiliation | Academia | Jiayuan Mao MIT Haoyue Shi TTIC Jiajun Wu Stanford University Roger P. Levy MIT Joshua B. Tenenbaum MIT |
| Pseudocode | Yes | Algorithm 1 The CKY-E2 algorithm. |
| Open Source Code | Yes | Project page: http://g2l2.csail.mit.edu. and Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We evaluate G2L2 on two domains: visual reasoning in CLEVR [21] and language-driven navigation in SCAN [25]. |
| Dataset Splits | Yes | Since CLEVR does not provide test set annotations, for all models, we held out 10% of the training data for model development and test them on the CLEVR validation split. |
| Hardware Specification | No | The main paper does not contain specific hardware details for running experiments. The checklist states, “Details can be found in the supplementary material.” |
| Software Dependencies | No | The main paper does not provide specific ancillary software details with version numbers. The checklist indicates that more detailed information might be in the supplementary material. |
| Experiment Setup | Yes | We train different models with either 10% or 100% of the training data and evaluate them on the validation set. and Instead of using manually defined heuristics for curriculum learning or self-paced learning as in previous works [28, 26], we employ a curriculum learning setup that is simply based on sentence length: we gradually add longer sentences into the training set. and We tuned the hidden size (i.e., the dimension of intermediate token representations) within {100, 200, 400}, as well as the number of layers (for both the encoder and the decoder) from {2, 4, 8}. |