Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Diverse Risk Preferences in Population-Based Self-Play
Authors: Yuhua Jiang, Qihan Liu, Xiaoteng Ma, Chenghao Li, Yiqin Yang, Jun Yang, Bin Liang, Qianchuan Zhao
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. |
| Researcher Affiliation | Academia | Department of Automation, Tsinghua University EMAIL EMAIL |
| Pseudocode | Yes | The algorithm is shown in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/Jackory/RPBT. |
| Open Datasets | Yes | We consider two competitive multi-agent benchmarks: Slimevolley (Ha 2020) and Sumoants (Al-Shedivat et al. 2018). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly specify validation dataset splits (e.g., percentages or counts) or a clear methodology for a validation set. |
| Hardware Specification | Yes | All the experiments are conducted with one 64-core CPU and one Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions the use of PPO and other frameworks but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We trained RPBT with population size 5 and set initial risk levels to {0.1, 0.4, 0.5, 0.6, 0.9} for all the experiments. For each method, we trained 3 runs using different random seeds and selected the one with the highest ELO score for evaluation. |