reproducibilityindex.ai

Population-size-Aware Policy Optimization for Mean-Field Games

Authors: Pengdeng Li, Xinrun Wang, Shuxin Li, Hau Chan, Bo An

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games.
Researcher Affiliation	Academia	1Nanyang Technological University, Singapore 2University of Nebraska, Lincoln, USA {pengdeng.li,xinrun.wang,shuxin.li,boan}@ntu.edu.sg, hchan3@unl.edu
Pseudocode	Yes	Algorithm 1: Training Procedure
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets	Yes	Environments. We consider the following environments which have been widely used in previous works: Exploration (Laurière et al., 2022), Taxi Matching (Nguyen et al., 2018), and Crowd in Circle (Perrin et al., 2020).
Dataset Splits	No	The paper mentions training on a 'target set G' and evaluating on 'unseen games' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	Moreover, all experiments are run on a machine with 20 Intel i9-9820X CPUs and 4 NVIDIA RTX2080 Ti GPUs, and averaged over 3 random seeds.
Software Dependencies	No	Table 3 lists 'optimizer Adam' but does not specify version numbers for key software libraries or frameworks like Python, PyTorch, or TensorFlow, nor detailed versions for solvers.
Experiment Setup	Yes	Table 3: Hyperparameters. optimizer Adam length of an episode T 20 minimum number of agents N 2 maximum number of agents N 200 maximum number of policy training episodes 2 107 maximum number of BR training episodes 1 106 actor learning rate 3 10 5 critic learning rate 3 10 4 update every E episodes 5 optimize K epochs at each update 5 critic loss coefficient c1 0.5 entropy loss coefficient c2 0.01 batch size m for computing CKA 1000 dimension of binary encoding k 12