Transition-Informed Reinforcement Learning for Large-Scale Stackelberg Mean-Field Games
Authors: Pengdeng Li, Runsheng Yu, Xinrun Wang, Bo An
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on fleet management and food gathering demonstrate that our approach can scale up to 100,000 followers and significantly outperform existing baselines. (Abstract) / We evaluate our approach on two scenarios: the e-hailing driver re-positioning (EDRP) and multiple-type food gathering (MTFG). (Section 5.1) |
| Researcher Affiliation | Academia | Pengdeng Li1, Runsheng Yu2, Xinrun Wang1*, Bo An1 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2Hong Kong University of Science and Technology, Hong Kong, China {pengdeng.li, xinrun.wang, boan}@ntu.edu.sg, runshengyu@gmail.com |
| Pseudocode | No | The paper does not include a figure, block, or section explicitly labeled |
| Open Source Code | Yes | Code is available at https://github.com/Ipad Li/SMFG. |
| Open Datasets | Yes | The EDRP environment is adapted from (Lin et al. 2018). In this scenario, the leader aims to improve the order response rate (ORR) of the whole city, while the followers maximize their own returns. ... we extract order information from a public dataset of taxi trips in Manhattan, which contains for each day the time and location of all the pickups and drop-offs executed by each of 13,000 active taxis. |
| Dataset Splits | No | The paper mentions |
| Hardware Specification | Yes | All experiments are run on a 64-bit workstation with 125 GB RAM, 20 Intel i9-9820X CPU @3.30GHz processors, and 4 NVIDIA RTX2080 Ti GPUs. |
| Software Dependencies | No | The paper describes the algorithmic frameworks used but does not provide specific software names with version numbers for dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper provides general experimental setup information such as baselines, environments, and the number of seeds used for runs. However, it does not explicitly detail specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other system-level training configurations for their models. |