Building Cooperative Embodied Agents Modularly with Large Language Models

Authors: Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on CWAH and TDW-MAT demonstrate that Co ELA driven by GPT-4 can surpass strong planning-based methods and exhibit emergent effective communication.
Researcher Affiliation Collaboration 1University of Massachusetts Amherst, 2Institute for Interdisciplinary Information Sciences, Tsinghua University, 3Shanghai Jiao Tong University, 4MIT, 5MIT-IBM Watson AI Lab
Pseudocode No The paper describes processes in narrative text and refers to modules, but does not present any formally structured pseudocode or algorithm blocks (e.g., labeled 'Algorithm 1').
Open Source Code No The paper mentions a project website for videos ('Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/') but does not explicitly state that the source code for the methodology described in this paper is available.
Open Datasets Yes Three DWorld Multi-Agent Transport (TDW-MAT) is a multi-agent embodied task extended from the Three DWorld Transport Challenge (Gan et al., 2022) with more types of objects and containers, more realistic object placements, and communication between agents supported, built on top of the TDW platform (Gan et al., 2021)... Communicative Watch-And-Help (C-WAH) is extended from the Watch-And-Help Challenge (Puig et al., 2021)... We train a Mask-RCNN on the training set for the Perception Module and instantiate Co ELA with the most powerful LLM GPT-4 from the Open AI API... we fine-tune the MASK-RCNN model pre-trained on the MS COCO dataset in training scenes.
Dataset Splits No The paper specifies 'training set' and 'test set' for TDW-MAT and C-WAH datasets. For example, for TDW-MAT, it states 'making a test set of 24 episodes'. However, it does not explicitly mention or detail a separate 'validation' dataset split for the main experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It mentions using 'GPT-4 from the Open AI API' which implies external computational resources without specifying their underlying hardware.
Software Dependencies No The paper mentions specific models and APIs like 'GPT-4 from the Open AI API' and 'LLAMA-2-13b-chat (Touvron et al., 2023)', and libraries like 'Lo RA (Hu et al., 2021)', but it does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, or other common libraries).
Experiment Setup Yes We train a Mask-RCNN on the training set for the Perception Module and instantiate Co ELA with the most powerful LLM GPT-4 from the Open AI API1 with the default parameter of temperature 0.7, top-p 1, and max tokens 256 unless other stated... We use Lo RA to fine-tune the LLAMA-2-13b-chat with a batch size of 384, a maximal sequence length of 2048, and a max learning rate of 4e-4 for 30 epochs (approximately 60 steps).