MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control
Authors: Xinshi Zang, Huaxiu Yao, Guanjie Zheng, Nan Xu, Kai Xu, Zhenhui Li1153-1160
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments on four real-world datasets show that our proposed Meta Light not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance. |
| Researcher Affiliation | Collaboration | Xinshi Zang,1 Huaxiu Yao,2 Guanjie Zheng,2 Nan Xu,1 Kai Xu,3 Zhenhui Li2 1Shanghai Jiao Tong University, 2Pennsylvania State University, 3Shanghai Tianrang Intelligent Technology Co., Ltd |
| Pseudocode | Yes | Algorithm 1: Meta-training process of Meta Light |
| Open Source Code | Yes | Codes are provided at https://traffic-signal-control.github.io/ |
| Open Datasets | Yes | We use four real-world datasets from two cities in China: Jinan (JN) and Hangzhou (HZ), and two cities in the United States: Atlanta (AT), and Los Angeles (LA). ... The other raw data from American cities is composed of the full vehicle trajectories which are collected by several video cameras along the streets3. 3https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm |
| Dataset Splits | No | The paper describes training and testing sets but does not provide explicit details about a separate validation dataset or its split percentages. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using a simulation platform called 'City Flow' but does not provide its specific version number, nor does it list versions for other software dependencies. |
| Experiment Setup | Yes | In Meta Light, the base model, FRAP++ shares the similar network structure with FRAP (Zheng et al. 2019a), except for the average operation in the embedding layers. The learning rates of learner and meta-learner are set as 0.001 for Meta Light and MAML in both meta-training and meta-testing. The episode length for all scenarios is 3600 seconds and the interval of each interaction between simulator and RL agent is 10 seconds. For Meta Light, the learner conducts model updating after each interaction using 30 samples and only one epoch for training. Meta-learner updates itself at intervals of ten times of learners updating. For MAML , the learner first undertakes one centralized updating at the end of each episode with 1000 samples and 100 epochs for training. Then, the meta-learner updates itself using new episodes each time. |