Embodied CoT Distillation From LLM To Off-the-shelf Agents
Authors: Wonje Choi, Woo Kyung Kim, Minjong Yoo, Honguk Woo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with the ALFRED benchmark demonstrate that DEDER surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of s LM-based embodied policies derived through DEDER. 5. Evaluation |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Republic of Korea. Correspondence to: Honguk Woo <hwoo@skku.edu>. |
| Pseudocode | Yes | Algorithm 1 Policy Distillation Rationale Dataset DRtn and Algorithm 2 Rationale Dataset Construction |
| Open Source Code | No | The paper refers to open-source projects for baselines and related work, but there is no explicit statement or link indicating that the code for DEDER (the work described in this paper) is open-source or publicly available. |
| Open Datasets | Yes | For evaluation, we use the ALFRED (Shridhar et al., 2020) environment. |
| Dataset Splits | No | The paper defines different evaluation task categories (Train, Seen, Unseen Spatial, Unseen Environment) and mentions using 312 trajectories for the expert dataset, which is used for training. However, it does not explicitly state the specific percentages or counts for a separate validation split of the training data. |
| Hardware Specification | Yes | Our framework is implemented using Python v3.9 and Py Torch v2.0.1, trained on a system of an Intel(R) Core (TM) i9-10980XE processor and an NVIDIA RTX A6000 GPU. We measured inference times several off-the-shelf devices such as RTX 3090, 3050 and 2080 Ti GPUs. |
| Software Dependencies | Yes | Our framework is implemented using Python v3.9 and Py Torch v2.0.1. We utilize various LMs like Pa LM, LLAMA, and GPT2-large. paraphrase-Mini LM-L6-v2 (LLM-Planner) stsb-roberta-large (ZSP) |
| Experiment Setup | Yes | We collect 312 expert trajectories in a variety of tasks varying the starting positions of the agent and objects as well as the indoor scenes. Train epochs 100, Batch size 1, Optimizer SGD, Learning rate 5e-5, Temperature 0.1, Beam Size 3, Top k 5, Top p 0.3, Maximum New Tokens 40, Scaling factor α 0.5 |