Population-size-Aware Policy Optimization for Mean-Field Games
Authors: Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games. |
| Researcher Affiliation | Academia | 1Nanyang Technological University, Singapore 2University of Nebraska, Lincoln, USA {pengdeng.li,xinrun.wang,shuxin.li,boan}@ntu.edu.sg, hchan3@unl.edu |
| Pseudocode | Yes | Algorithm 1: Training Procedure |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | Environments. We consider the following environments which have been widely used in previous works: Exploration (Laurière et al., 2022), Taxi Matching (Nguyen et al., 2018), and Crowd in Circle (Perrin et al., 2020). |
| Dataset Splits | No | The paper mentions training on a 'target set G' and evaluating on 'unseen games' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Moreover, all experiments are run on a machine with 20 Intel i9-9820X CPUs and 4 NVIDIA RTX2080 Ti GPUs, and averaged over 3 random seeds. |
| Software Dependencies | No | Table 3 lists 'optimizer Adam' but does not specify version numbers for key software libraries or frameworks like Python, PyTorch, or TensorFlow, nor detailed versions for solvers. |
| Experiment Setup | Yes | Table 3: Hyperparameters. optimizer Adam length of an episode T 20 minimum number of agents N 2 maximum number of agents N 200 maximum number of policy training episodes 2 107 maximum number of BR training episodes 1 106 actor learning rate 3 10 5 critic learning rate 3 10 4 update every E episodes 5 optimize K epochs at each update 5 critic loss coefficient c1 0.5 entropy loss coefficient c2 0.01 batch size m for computing CKA 1000 dimension of binary encoding k 12 |