reproducibilityindex.ai

Building Cooperative Embodied Agents Modularly with Large Language Models

Authors: Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on CWAH and TDW-MAT demonstrate that Co ELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
Researcher Affiliation	Collaboration	1University of Massachusetts Amherst, 2Institute for Interdisciplinary Information Sciences, Tsinghua University, 3Shanghai Jiao Tong University, 4MIT, 5MIT-IBM Watson AI Lab
Pseudocode	No	The paper describes processes in narrative text and refers to modules, but does not present any formally structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1').
Open Source Code	No	The paper mentions a project website for videos ('Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/') but does not explicitly state that the source code for the methodology described in this paper is available.
Open Datasets	Yes	Three DWorld Multi-Agent Transport (TDW-MAT) is a multi-agent embodied task extended from the Three DWorld Transport Challenge (Gan et al., 2022) with more types of objects and containers, more realistic object placements, and communication between agents supported, built on top of the TDW platform (Gan et al., 2021)... Communicative Watch-And-Help (C-WAH) is extended from the Watch-And-Help Challenge (Puig et al., 2021)... We train a Mask-RCNN on the training set for the Perception Module and instantiate Co ELA with the most powerful LLM GPT-4 from the Open AI API... we fine-tune the MASK-RCNN model pre-trained on the MS COCO dataset in training scenes.
Dataset Splits	No	The paper specifies 'training set' and 'test set' for TDW-MAT and C-WAH datasets. For example, for TDW-MAT, it states 'making a test set of 24 episodes'. However, it does not explicitly mention or detail a separate 'validation' dataset split for the main experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It mentions using 'GPT-4 from the Open AI API' which implies external computational resources without specifying their underlying hardware.
Software Dependencies	No	The paper mentions specific models and APIs like 'GPT-4 from the Open AI API' and 'LLAMA-2-13b-chat (Touvron et al., 2023)', and libraries like 'Lo RA (Hu et al., 2021)', but it does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, or other common libraries).
Experiment Setup	Yes	We train a Mask-RCNN on the training set for the Perception Module and instantiate Co ELA with the most powerful LLM GPT-4 from the Open AI API1 with the default parameter of temperature 0.7, top-p 1, and max tokens 256 unless other stated... We use Lo RA to fine-tune the LLAMA-2-13b-chat with a batch size of 384, a maximal sequence length of 2048, and a max learning rate of 4e-4 for 30 epochs (approximately 60 steps).