Multi-agent Reinforcement Learning for Networked System Control

Authors: Tianshu Chu, Sandeep Chinchali, Sachin Katti

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental NUMERICAL EXPERIMENTS There are several benchmark MARL environments such as cooperative navigation and predator-prey, but few of them represent NSC. Here we design two NSC environments: adaptive traffic signal control (ATSC) and cooperative adaptive cruise control (CACC). Both ATSC and CACC are extensively studied in intelligent transportation systems, and they hold assumptions of a spatiotemporal MDP.
Researcher Affiliation Collaboration Tianshu Chu Uhana Inc. Palo Alto, CA 94304, USA cts198859@hotmail.com Sandeep Chinchali & Sachin Katti Stanford University Stanford, CA 94305, USA {csandeep,skatti}@stanford.edu
Pseudocode Yes Algorithm 1: Multi-agent A2C with Neur Comm (Training) Parameter :α, β, γ, T, |B|, ηω, ηθ. Result: {λi, νi, ωi, θi}i V. Algorithm 2: Multi-agent A2C with Neur Comm (Execution) Parameter :{λi, νi, ωi, θi}i V, tcomm, tcontrol.
Open Source Code Yes Code link: https://github.com/cts198859/deeprl_network.
Open Datasets No The paper describes using the SUMO simulator to create custom traffic scenarios (5x5 synthetic traffic grid and Monaco traffic network) and custom CACC scenarios. While SUMO itself is public software, the specific datasets (traffic flow configurations, vehicle dynamics) generated within these environments for the experiments are not provided with concrete access information like a link, DOI, or formal citation for a specific dataset file.
Dataset Splits No The paper describes the training process, including total steps and episode length, but does not specify any explicit training/validation/test dataset splits, percentages, or sample counts for reproduction.
Hardware Specification Yes Each training takes about 30 hours on a 32GB memory, Intel Xeon CPU machine.
Software Dependencies No The paper mentions using 'standard microscopic traffic simulator SUMO (Krajzewicz et al., 2012)', but it does not provide a specific version number for SUMO or any other key software components used in the experiments.
Experiment Setup Yes All algorithms use the same DNN hidden layers: one fully-connected layer for message encoding eλ, and one LSTM layer for message extracting gν. All hidden layers have 64 units. ... We train each model over 1M steps, with γ = 0.99, actor learning rate 5 10 4, and critic learning rate 2.5 10 4. Also, each training episode has a different seed for generalization purposes. In ATSC, β = 0.01, |B| = 120, while in CACC, β = 0.05, |B| = 60, to encourage the exploration of collision-free policies.