reproducibilityindex.ai

Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Authors: Muhammad Rahman, Jiaxun Cui, Peter Stone

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.
Researcher Affiliation	Collaboration	Muhammad Rahman1, Jiaxun Cui1, Peter Stone1,2 1Department of Computer Science, The University of Texas at Austin 2Sony AI
Pseudocode	Yes	Algorithm 1: Lagrangian Best Response Diversity
Open Source Code	Yes	1Implementation of L-BRDiv is available at https://github.com/ raharrasy/L-BRDiv.
Open Datasets	No	The paper describes environments used for experiments (e.g., 'repeated matrix game', 'Cooperative Reaching', 'Level-based Foraging (LBF)'), but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year for a publicly available or open dataset) for any specific dataset used for training.
Dataset Splits	No	The paper does not specify exact split percentages or sample counts for training, validation, or test sets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments (e.g., specific GPU/CPU models, memory amounts).
Software Dependencies	No	The paper mentions 'MAPPO (Yu et al. 2022)' and 'RL2 algorithm (Duan et al. 2016)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper generally describes the experiment setup, such as using the RL2 algorithm and repeating experiments under four seeds, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.