Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Whatβs Left? Concept Grounding with Logic-Enhanced Foundation Models
Authors: Joy Hsu, Jiayuan Mao, Josh Tenenbaum, Jiajun Wu
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LEFT on four domains and seven tasks, and show its effectiveness in multiple settings. |
| Researcher Affiliation | Academia | Joy Hsu Stanford University Jiayuan Mao MIT Joshua B. Tenenbaum MIT Jiajun Wu Stanford University |
| Pseudocode | No | The paper describes the model's components and execution strategy in prose, but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We publicly release our code. See our project website for additional details. |
| Open Datasets | Yes | We use the official data splits released for each dataset, CLEVR [Johnson et al., 2017a], Refer It3D [Achlioptas et al., 2020], Human Motion QA [Endo et al., 2023], and Cliport [Shridhar et al., 2022]. |
| Dataset Splits | Yes | We use the official data splits released for each dataset, CLEVR [Johnson et al., 2017a], Refer It3D [Achlioptas et al., 2020], Human Motion QA [Endo et al., 2023], and Cliport [Shridhar et al., 2022]. |
| Hardware Specification | Yes | We trained with 1 NVIDIA Titan RTX per experiment for all datasets, from an internal cluster. |
| Software Dependencies | No | The paper mentions using GPT-3.5 and LLAMA but does not provide specific version numbers for other key software dependencies or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The core hyperparameters were set as 128 for concept embedding dimensions, and learning rate taken from prior neuro-symbolic concept learning repositories that we use as baselines. |