Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Agent Intention Progression with Black-Box Agents

Authors: Michael Dann, Yuan Yao, Brian Logan, John Thangarajah

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our p GPT-based approach in cooperative, sel๏ฌsh and adversarial multi-agent settings, and show that it out-performs MCTS-based scheduling where agents assume that other agents have the same program as themselves.
Researcher Affiliation Academia 1RMIT University 2Zhejiang University of Technology 3Utrecht University EMAIL, EMAIL, EMAIL
Pseudocode No Pseudocode for the computation of candidate steps can be found in the extended version of the paper.
Open Source Code Yes Full source code and an extended version of this paper containing further experimental details is available at: https://github.com/mchldann/pGPT IJCAI.
Open Datasets No The paper generates synthetic pGPTs for its experiments using a generator based on the Intention Progression Competition's GPT generator. It does not use or provide concrete access information for a publicly available, pre-existing dataset.
Dataset Splits No The paper mentions generating synthetic pGPTs for evaluation and describes how they are structured (e.g., "depth of 5", "two corresponding plans"). It also states "scores averaged over 500 randomly generated p GPT forests." However, it does not specify explicit training/validation/test splits, instead focusing on performance across randomized trials.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions, or specific solver versions).
Experiment Setup Yes The generated p GPTs had a depth of 5. Each subgoal had two corresponding plans, with each plan containing three actions and one subgoal (except the lowest-level plans, which contained three actions only). For each trial, 12 GPTs were generated, with 6 assigned to each agent. The full set of p GPTs contained 80 environment variables... scores averaged over 500 randomly generated p GPT forests.