Learning Grounded Action Abstractions from Language

Authors: Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Ada (Fig. 1) on two benchmarks, Mini Minecraft and ALFRED (Shridhar et al., 2020). We compare this approach against three baselines that leverage LMs for sequential decisionmaking, offering more accurate plans and better generalization to complex tasks.
Researcher Affiliation Collaboration 1MIT 2Princeton University 3UC Berkeley 4Microsoft
Pseudocode Yes Algorithm 1 Action Abstraction Learning from Language
Open Source Code Yes Code for this paper will be released at: https://github.com/Catherine Wong/llm-operators
Open Datasets Yes We evaluate our approach on two-language specified planning-benchmarks: Mini Minecraft and ALFRED (Shridhar et al., 2020). Mini Minecraft (Fig. 5, top) is a procedurally-generated Minecraft-like benchmark (Chen et al., 2021; Luo et al., 2023).
Dataset Splits No The paper does not explicitly provide training/validation/test splits with percentages or sample counts for reproduction, beyond mentioning a random subset of tasks for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using "GPT-3.5" and "Fast Downward (Helmert, 2006)", but does not specify version numbers for other programming languages or libraries used in the implementation for reproducibility.
Experiment Setup Yes For each task, at each iteration, we sample n=4 initial goal proposals and n=4 initial task decompositions, and n=3 operator definition proposals for each operator name. ... For Minecraft, we set the motion planning budget for each subgoal to 1000 nodes. For ALFRED, which requires a slow Unity simulation, we set it to 50 nodes. Additional temperature and sampling details are in the Appendix.