Learning Grounded Action Abstractions from Language
Authors: Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ada (Fig. 1) on two benchmarks, Mini Minecraft and ALFRED (Shridhar et al., 2020). We compare this approach against three baselines that leverage LMs for sequential decisionmaking, offering more accurate plans and better generalization to complex tasks. |
| Researcher Affiliation | Collaboration | 1MIT 2Princeton University 3UC Berkeley 4Microsoft |
| Pseudocode | Yes | Algorithm 1 Action Abstraction Learning from Language |
| Open Source Code | Yes | Code for this paper will be released at: https://github.com/Catherine Wong/llm-operators |
| Open Datasets | Yes | We evaluate our approach on two-language specified planning-benchmarks: Mini Minecraft and ALFRED (Shridhar et al., 2020). Mini Minecraft (Fig. 5, top) is a procedurally-generated Minecraft-like benchmark (Chen et al., 2021; Luo et al., 2023). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test splits with percentages or sample counts for reproduction, beyond mentioning a random subset of tasks for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "GPT-3.5" and "Fast Downward (Helmert, 2006)", but does not specify version numbers for other programming languages or libraries used in the implementation for reproducibility. |
| Experiment Setup | Yes | For each task, at each iteration, we sample n=4 initial goal proposals and n=4 initial task decompositions, and n=3 operator definition proposals for each operator name. ... For Minecraft, we set the motion planning budget for each subgoal to 1000 nodes. For ALFRED, which requires a slow Unity simulation, we set it to 50 nodes. Additional temperature and sampling details are in the Appendix. |