Transition-Informed Reinforcement Learning for Large-Scale Stackelberg Mean-Field Games

Authors: Pengdeng Li, Runsheng Yu, Xinrun Wang, Bo An

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on fleet management and food gathering demonstrate that our approach can scale up to 100,000 followers and significantly outperform existing baselines. (Abstract) / We evaluate our approach on two scenarios: the e-hailing driver re-positioning (EDRP) and multiple-type food gathering (MTFG). (Section 5.1)
Researcher Affiliation Academia Pengdeng Li1, Runsheng Yu2, Xinrun Wang1*, Bo An1 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2Hong Kong University of Science and Technology, Hong Kong, China {pengdeng.li, xinrun.wang, boan}@ntu.edu.sg, runshengyu@gmail.com
Pseudocode No The paper does not include a figure, block, or section explicitly labeled
Open Source Code Yes Code is available at https://github.com/Ipad Li/SMFG.
Open Datasets Yes The EDRP environment is adapted from (Lin et al. 2018). In this scenario, the leader aims to improve the order response rate (ORR) of the whole city, while the followers maximize their own returns. ... we extract order information from a public dataset of taxi trips in Manhattan, which contains for each day the time and location of all the pickups and drop-offs executed by each of 13,000 active taxis.
Dataset Splits No The paper mentions
Hardware Specification Yes All experiments are run on a 64-bit workstation with 125 GB RAM, 20 Intel i9-9820X CPU @3.30GHz processors, and 4 NVIDIA RTX2080 Ti GPUs.
Software Dependencies No The paper describes the algorithmic frameworks used but does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper provides general experimental setup information such as baselines, environments, and the number of seeds used for runs. However, it does not explicitly detail specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other system-level training configurations for their models.