Safe Multi-agent Learning via Trapping Regions

Authors: Aleksander Czechowski, Frans A. Oliehoek

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.
Researcher Affiliation Academia Aleksander Czechowski , Frans A. Oliehoek Delft University of Technology
Pseudocode Yes Algorithm 1 Rigorous trapping region verification via binary space partitioning. Inputs: Learning dynamics F, T = [x11 , x11 + ] [xnkn , xnkn + ] a candidate for the trapping region, L upper bound for Lipschitz constant of F over T. Returns: Is T a trapping region? Start: 1: for agent i in 1:n in parallel do... Algorithm 2 Non-rigorous trapping region verification via sampling. Inputs: Learning dynamics F, F learning dynamics, can be only sampled (e.g. from simulator), T = [x11 , x11 + ] [xnkn , xnkn + ] a candidate for the trapping region, M sample size per face Returns: Is T a trapping region? Start: 1: for agent i in 1:n in parallel do...
Open Source Code No The paper mentions using an 'open-source microscopic traffic simulator SUMO', but it does not explicitly state that the code for *their* proposed methodology is open-source or provide a link to it.
Open Datasets No For the traffic management example, the paper describes a procedure for generating vehicle streams within the SUMO simulator ('One vehicle would be spawned each ten seconds...'), but it does not refer to a pre-existing, publicly available dataset or provide access to any generated data.
Dataset Splits No The paper describes simulation experiments for multi-agent learning scenarios, which do not typically involve traditional train/validation/test dataset splits in the same way as supervised learning. No specific dataset split information (percentages, counts, or citations to predefined splits) is provided.
Hardware Specification Yes The experiment used AMD 7452 and AMD 7502P CPUs, 2.35 and 2.5 Ghz respectively.
Software Dependencies Yes The simulation is controlled by an open-source microscopic traffic simulator SUMO [Lopez et al., 2018], version 1.8.0.
Experiment Setup Yes The cycle time, i.e. the total time for serving the horizontal and, subsequently, the vertical movement is set to 60 seconds. ... Each intersection controller estimates the gradient of own reward by difference quotients: δ xi Ri(x) Ri(xi, x i) Ri(xi + δ, x i) for some small δ (in our experiments δ = 0.1). The adaptation rate γ is set to 10 6. ... T = [20, 40]4. From the nature of the problem, we expect the reward function to be continuous, but we do not have an analytical formula for it. Therefore, we apply Algorithm 2 and sample faces of T with a uniform rectangular grid of five points in each direction (M = 125).