Emergent Road Rules In Multi-Agent Driving Environments

Authors: Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we provide empirical evidence that suggests that instead of hard-coding road rules into self-driving algorithms a scalable alternative may be to design multi-agent environments in which road rules emerge as optimal solutions to the problem of maximizing traffic flow. We analyze what ingredients in driving environments cause the emergence of these road rules and find that two crucial factors are noisy perception and agents spatial density. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors
Researcher Affiliation Collaboration Avik Pal1, Jonah Philion2,3,4, Yuan-Hong Liao2,4, Sanja Fidler2,3,4 1IIT Kanpur, 2University of Toronto, 3NVIDIA, 4Vector Institute
Pseudocode Yes Algorithm 1: Alternating Optimization for Spline and Acceleration Control
Open Source Code Yes Code and Documentation for all experiments presented in this paper can be found in our Project Page1. ... 2https://github.com/fidler-lab/social-driving
Open Datasets Yes We use multiple environments: four-way intersection, highway tracks, and real-world road patches from nu Scenes (Caesar et al., 2019)
Dataset Splits No The paper mentions using the 'nu Scenes trainval split' in Section 6.1, which indicates the dataset has predefined splits. However, it does not explicitly specify the exact percentages or sample counts for training, validation, and test splits used in their experiments, nor does it provide details on how these splits were reproduced if custom.
Hardware Specification No The paper does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments or train the models.
Software Dependencies No The paper mentions using 'Proximal Policy Optimization (PPO)' and 'Generalized Advantage Estimation (GAE)', and that 'Training is performed using a custom adaptation of Spinning Up (Achiam, 2018) for MARL and Horovod (Sergeev and Balso, 2018)'. However, it does not specify version numbers for these software components or other dependencies like Python or deep learning frameworks.
Experiment Setup No The paper describes the reward function, observation types, and policy parameterizations. It mentions hyperparameters {ϵ, c1, c2} for the PPO objective and parameters like N, K1, K2 for the optimization algorithm. However, the specific numerical values for these hyperparameters (e.g., learning rate, batch size, number of epochs, values for ϵ, c1, c2) are not explicitly stated in the main text.