reproducibilityindex.ai

Multi-agent Reinforcement Learning for Networked System Control

Authors: Tianshu Chu, Sandeep Chinchali, Sachin Katti

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	NUMERICAL EXPERIMENTS There are several benchmark MARL environments such as cooperative navigation and predator-prey, but few of them represent NSC. Here we design two NSC environments: adaptive trafﬁc signal control (ATSC) and cooperative adaptive cruise control (CACC). Both ATSC and CACC are extensively studied in intelligent transportation systems, and they hold assumptions of a spatiotemporal MDP.
Researcher Affiliation	Collaboration	Tianshu Chu Uhana Inc. Palo Alto, CA 94304, USA cts198859@hotmail.com Sandeep Chinchali & Sachin Katti Stanford University Stanford, CA 94305, USA {csandeep,skatti}@stanford.edu
Pseudocode	Yes	Algorithm 1: Multi-agent A2C with Neur Comm (Training) Parameter :α, β, γ, T, \|B\|, ηω, ηθ. Result: {λi, νi, ωi, θi}i V. Algorithm 2: Multi-agent A2C with Neur Comm (Execution) Parameter :{λi, νi, ωi, θi}i V, tcomm, tcontrol.
Open Source Code	Yes	Code link: https://github.com/cts198859/deeprl_network.
Open Datasets	No	The paper describes using the SUMO simulator to create custom traffic scenarios (5x5 synthetic traffic grid and Monaco traffic network) and custom CACC scenarios. While SUMO itself is public software, the specific datasets (traffic flow configurations, vehicle dynamics) generated within these environments for the experiments are not provided with concrete access information like a link, DOI, or formal citation for a specific dataset file.
Dataset Splits	No	The paper describes the training process, including total steps and episode length, but does not specify any explicit training/validation/test dataset splits, percentages, or sample counts for reproduction.
Hardware Specification	Yes	Each training takes about 30 hours on a 32GB memory, Intel Xeon CPU machine.
Software Dependencies	No	The paper mentions using 'standard microscopic trafﬁc simulator SUMO (Krajzewicz et al., 2012)', but it does not provide a specific version number for SUMO or any other key software components used in the experiments.
Experiment Setup	Yes	All algorithms use the same DNN hidden layers: one fully-connected layer for message encoding eλ, and one LSTM layer for message extracting gν. All hidden layers have 64 units. ... We train each model over 1M steps, with γ = 0.99, actor learning rate 5 10 4, and critic learning rate 2.5 10 4. Also, each training episode has a different seed for generalization purposes. In ATSC, β = 0.01, \|B\| = 120, while in CACC, β = 0.05, \|B\| = 60, to encourage the exploration of collision-free policies.