Learning Diverse Risk Preferences in Population-Based Self-Play
Authors: Yuhua Jiang, Qihan Liu, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. |
| Researcher Affiliation | Academia | Department of Automation, Tsinghua University {jiangyh22, lqh20, ma-xt17, lch18, yangyiqi19}@mails.tsinghua.edu.cn {yangjun603, bliang, zhaoqc}@tsinghua.edu.cn |
| Pseudocode | Yes | The algorithm is shown in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/Jackory/RPBT. |
| Open Datasets | Yes | We consider two competitive multi-agent benchmarks: Slimevolley (Ha 2020) and Sumoants (Al-Shedivat et al. 2018). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly specify validation dataset splits (e.g., percentages or counts) or a clear methodology for a validation set. |
| Hardware Specification | Yes | All the experiments are conducted with one 64-core CPU and one Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions the use of PPO and other frameworks but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We trained RPBT with population size 5 and set initial risk levels to {0.1, 0.4, 0.5, 0.6, 0.9} for all the experiments. For each method, we trained 3 runs using different random seeds and selected the one with the highest ELO score for evaluation. |