Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Agent Intention Progression with Black-Box Agents
Authors: Michael Dann, Yuan Yao, Brian Logan, John Thangarajah
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our p GPT-based approach in cooperative, sel๏ฌsh and adversarial multi-agent settings, and show that it out-performs MCTS-based scheduling where agents assume that other agents have the same program as themselves. |
| Researcher Affiliation | Academia | 1RMIT University 2Zhejiang University of Technology 3Utrecht University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | Pseudocode for the computation of candidate steps can be found in the extended version of the paper. |
| Open Source Code | Yes | Full source code and an extended version of this paper containing further experimental details is available at: https://github.com/mchldann/pGPT IJCAI. |
| Open Datasets | No | The paper generates synthetic pGPTs for its experiments using a generator based on the Intention Progression Competition's GPT generator. It does not use or provide concrete access information for a publicly available, pre-existing dataset. |
| Dataset Splits | No | The paper mentions generating synthetic pGPTs for evaluation and describes how they are structured (e.g., "depth of 5", "two corresponding plans"). It also states "scores averaged over 500 randomly generated p GPT forests." However, it does not specify explicit training/validation/test splits, instead focusing on performance across randomized trials. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions, or specific solver versions). |
| Experiment Setup | Yes | The generated p GPTs had a depth of 5. Each subgoal had two corresponding plans, with each plan containing three actions and one subgoal (except the lowest-level plans, which contained three actions only). For each trial, 12 GPTs were generated, with 6 assigned to each agent. The full set of p GPTs contained 80 environment variables... scores averaged over 500 randomly generated p GPT forests. |