reproducibilityindex.ai

Embodied CoT Distillation From LLM To Off-the-shelf Agents

Authors: Wonje Choi, Woo Kyung Kim, Minjong Yoo, Honguk Woo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with the ALFRED benchmark demonstrate that DEDER surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of s LM-based embodied policies derived through DEDER. 5. Evaluation
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Republic of Korea. Correspondence to: Honguk Woo <hwoo@skku.edu>.
Pseudocode	Yes	Algorithm 1 Policy Distillation Rationale Dataset DRtn and Algorithm 2 Rationale Dataset Construction
Open Source Code	No	The paper refers to open-source projects for baselines and related work, but there is no explicit statement or link indicating that the code for DEDER (the work described in this paper) is open-source or publicly available.
Open Datasets	Yes	For evaluation, we use the ALFRED (Shridhar et al., 2020) environment.
Dataset Splits	No	The paper defines different evaluation task categories (Train, Seen, Unseen Spatial, Unseen Environment) and mentions using 312 trajectories for the expert dataset, which is used for training. However, it does not explicitly state the specific percentages or counts for a separate validation split of the training data.
Hardware Specification	Yes	Our framework is implemented using Python v3.9 and Py Torch v2.0.1, trained on a system of an Intel(R) Core (TM) i9-10980XE processor and an NVIDIA RTX A6000 GPU. We measured inference times several off-the-shelf devices such as RTX 3090, 3050 and 2080 Ti GPUs.
Software Dependencies	Yes	Our framework is implemented using Python v3.9 and Py Torch v2.0.1. We utilize various LMs like Pa LM, LLAMA, and GPT2-large. paraphrase-Mini LM-L6-v2 (LLM-Planner) stsb-roberta-large (ZSP)
Experiment Setup	Yes	We collect 312 expert trajectories in a variety of tasks varying the starting positions of the agent and objects as well as the indoor scenes. Train epochs 100, Batch size 1, Optimizer SGD, Learning rate 5e-5, Temperature 0.1, Beam Size 3, Top k 5, Top p 0.3, Maximum New Tokens 40, Scaling factor α 0.5