Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments
Authors: Sid Nayak, Adelmo Morrison Orozco, Marina Have, Jackson Zhang, Vittal Thirumalai, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Anuj Mahajan, Brian Ichter, Hamsa Balakrishnan
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that LLa MAR achieves a 30% higher success rate than other state-of-the-art LM-based multi-agent planners in MAP-THOR and Search & Rescue tasks. Code can be found at https://github.com/nsidn98/LLa MAR |
| Researcher Affiliation | Collaboration | 1MIT 2TCS 3USAF-MIT AI Accelerator 4Stanford 5Google Deep Mind 6Apple |
| Pseudocode | Yes | The pseudocode for our approach is in Appendix E. |
| Open Source Code | Yes | Code can be found at https://github.com/nsidn98/LLa MAR |
| Open Datasets | Yes | Additionally, we present MAP-THOR, a comprehensive test suite encompassing household tasks of varying complexity within the AI2-THOR environment. More information about the MAP-THOR and SAR environments can be found in Appendix B and D respectively. |
| Dataset Splits | No | The paper specifies training and testing, but does not explicitly mention a validation dataset split or a methodology for it. |
| Hardware Specification | No | The paper mentions 'Open AI credits for GPT-4 access' and 'one (1) Apple M1 core' for Sentence BERT fine-tuning, but lacks specific hardware details for the main experiments with LLa MAR. |
| Software Dependencies | Yes | We use the clip-vit-large-patch14-336 model for the CLIP weights which we download from https://huggingface.co/openai/clip-vit-large-patch14-336. We finetuned a pre-trained BERT model to function as a semantic mapper between free-form natural language output and the robot s admissible actions in the environment. The pre-trained weights were obtained from https://huggingface.co/sentence-transformers/all-Mini LM-L6-v2. |
| Experiment Setup | Yes | Hyperparameters and additional details of the sentence transformer fine-tuning are provided in Appendix J. Epochs 10 Max gradient norm 1 Learning rate 2 10 5 Batch size 64 Encoding dimension 384 Optimizer Adam W Scheduler Warm-up linear Warm-up steps 45 Weight decay 0.01 Loss scale 20 Loss type Multiple negatives ranking loss Similarity function Cosine similarity |