Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
Authors: Muhammad Rahman, Jiaxun Cui, Peter Stone
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies. |
| Researcher Affiliation | Collaboration | Muhammad Rahman1, Jiaxun Cui1, Peter Stone1,2 1Department of Computer Science, The University of Texas at Austin 2Sony AI |
| Pseudocode | Yes | Algorithm 1: Lagrangian Best Response Diversity |
| Open Source Code | Yes | 1Implementation of L-BRDiv is available at https://github.com/ raharrasy/L-BRDiv. |
| Open Datasets | No | The paper describes environments used for experiments (e.g., 'repeated matrix game', 'Cooperative Reaching', 'Level-based Foraging (LBF)'), but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year for a publicly available or open dataset) for any specific dataset used for training. |
| Dataset Splits | No | The paper does not specify exact split percentages or sample counts for training, validation, or test sets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments (e.g., specific GPU/CPU models, memory amounts). |
| Software Dependencies | No | The paper mentions 'MAPPO (Yu et al. 2022)' and 'RL2 algorithm (Duan et al. 2016)' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper generally describes the experiment setup, such as using the RL2 algorithm and repeating experiments under four seeds, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |