reproducibilityindex.ai

Language Grounded Multi-agent Reinforcement Learning with Human-interpretable Communication

Authors: Huao Li, Hossein Nourkhiz Mahjoub, Behdad Chalaki, Vaishnav Tadiparthi, Kwonjoon Lee, Ehsan Moradi Pari, Charles Lewis, Katia Sycara

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that introducing language grounding not only maintains task performance but also accelerates the emergence of communication. Furthermore, the learned communication protocols exhibit zero-shot generalization capabilities in ad-hoc teamwork scenarios with unseen teammates and novel task states.
Researcher Affiliation	Collaboration	Huao Li University of Pittsburgh hul52@pitt.edu Hossein Nourkhiz Mahjoub Honda Research Institute USA, Inc. hossein_nourkhizmahjoub@honda-ri.com Behdad Chalaki Honda Research Institute USA, Inc. behdad_chalaki@honda-ri.com Vaishnav Tadiparthi Honda Research Institute USA, Inc. vaishnav_tadiparthi@honda-ri.com Kwonjoon Lee Honda Research Institute USA, Inc. kwonjoon_lee@honda-ri.com Ehsan Moradi-Pari Honda Research Institute USA, Inc. emoradipari@honda-ri.com Michael Lewis University of Pittsburgh ml@sis.pitt.edu Katia Sycara Carnegie Mellon University sycara@andrew.cmu.edu
Pseudocode	No	The paper describes the computational pipeline and methods in prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://romanlee6. github.io/langground_web/.
Open Datasets	No	In order to construct dataset D, we collected expert trajectories from embodied LLM agents powered by GPT-4 in interactive task scenarios.
Dataset Splits	Yes	The y-axis is task performance measured by the episode length until task completion, which is lower the better. The x-axis is the number of training timestamps. Shaded areas are standard errors over three random seeds.
Hardware Specification	Yes	All experiments were conducted on a machine with a 14-core Intel(R) Core(TM) i9-12900H CPU and 64GB memory.
Software Dependencies	Yes	We use Open AI s API to call gpt-4-0125-preview as the backbone pre-trained model and set the temperature parameter to 0 to ensure consistent outputs.
Experiment Setup	Yes	The batch size is 500, and the number of update iterations in an epoch is 10. Training on ppv0 and USAR takes 2000 epochs and 1e7 timestamps, which takes about 4 hours to complete. Training on ppv1 takes 500 epochs and 2.5e6 timestamps, which takes about 1.5 hours to complete. We use a learning rate of 0.0001 for USAR and 0.001 for pp. MARL agent s action policy is an LSTM with a hidden layer of size 256. Communication vectors are exchanged one round at each timestamp. The supervised learning weight λ is 1 in pp and 10 in USAR.