Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Population-size-Aware Policy Optimization for Mean-Field Games
Authors: Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games. |
| Researcher Affiliation | Academia | 1Nanyang Technological University, Singapore 2University of Nebraska, Lincoln, USA EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Training Procedure |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | Environments. We consider the following environments which have been widely used in previous works: Exploration (Lauriรจre et al., 2022), Taxi Matching (Nguyen et al., 2018), and Crowd in Circle (Perrin et al., 2020). |
| Dataset Splits | No | The paper mentions training on a 'target set G' and evaluating on 'unseen games' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Moreover, all experiments are run on a machine with 20 Intel i9-9820X CPUs and 4 NVIDIA RTX2080 Ti GPUs, and averaged over 3 random seeds. |
| Software Dependencies | No | Table 3 lists 'optimizer Adam' but does not specify version numbers for key software libraries or frameworks like Python, PyTorch, or TensorFlow, nor detailed versions for solvers. |
| Experiment Setup | Yes | Table 3: Hyperparameters. optimizer Adam length of an episode T 20 minimum number of agents N 2 maximum number of agents N 200 maximum number of policy training episodes 2 107 maximum number of BR training episodes 1 106 actor learning rate 3 10 5 critic learning rate 3 10 4 update every E episodes 5 optimize K epochs at each update 5 critic loss coefficient c1 0.5 entropy loss coefficient c2 0.01 batch size m for computing CKA 1000 dimension of binary encoding k 12 |