reproducibilityindex.ai

Learning Grounded Action Abstractions from Language

Authors: Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Ada (Fig. 1) on two benchmarks, Mini Minecraft and ALFRED (Shridhar et al., 2020). We compare this approach against three baselines that leverage LMs for sequential decisionmaking, offering more accurate plans and better generalization to complex tasks.
Researcher Affiliation	Collaboration	1MIT 2Princeton University 3UC Berkeley 4Microsoft
Pseudocode	Yes	Algorithm 1 Action Abstraction Learning from Language
Open Source Code	Yes	Code for this paper will be released at: https://github.com/Catherine Wong/llm-operators
Open Datasets	Yes	We evaluate our approach on two-language specified planning-benchmarks: Mini Minecraft and ALFRED (Shridhar et al., 2020). Mini Minecraft (Fig. 5, top) is a procedurally-generated Minecraft-like benchmark (Chen et al., 2021; Luo et al., 2023).
Dataset Splits	No	The paper does not explicitly provide training/validation/test splits with percentages or sample counts for reproduction, beyond mentioning a random subset of tasks for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "GPT-3.5" and "Fast Downward (Helmert, 2006)", but does not specify version numbers for other programming languages or libraries used in the implementation for reproducibility.
Experiment Setup	Yes	For each task, at each iteration, we sample n=4 initial goal proposals and n=4 initial task decompositions, and n=3 operator definition proposals for each operator name. ... For Minecraft, we set the motion planning budget for each subgoal to 1000 nodes. For ALFRED, which requires a slow Unity simulation, we set it to 50 nodes. Additional temperature and sampling details are in the Appendix.