Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents

Authors: Muhammad Rahman, Jiaxun Cui, Peter Stone

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.
Researcher Affiliation Collaboration Muhammad Rahman1, Jiaxun Cui1, Peter Stone1,2 1Department of Computer Science, The University of Texas at Austin 2Sony AI
Pseudocode Yes Algorithm 1: Lagrangian Best Response Diversity
Open Source Code Yes 1Implementation of L-BRDiv is available at https://github.com/ raharrasy/L-BRDiv.
Open Datasets No The paper describes environments used for experiments (e.g., 'repeated matrix game', 'Cooperative Reaching', 'Level-based Foraging (LBF)'), but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year for a publicly available or open dataset) for any specific dataset used for training.
Dataset Splits No The paper does not specify exact split percentages or sample counts for training, validation, or test sets.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments (e.g., specific GPU/CPU models, memory amounts).
Software Dependencies No The paper mentions 'MAPPO (Yu et al. 2022)' and 'RL2 algorithm (Duan et al. 2016)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper generally describes the experiment setup, such as using the RL2 algorithm and repeating experiments under four seeds, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.