Emergent Road Rules In Multi-Agent Driving Environments
Authors: Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we provide empirical evidence that suggests that instead of hard-coding road rules into self-driving algorithms a scalable alternative may be to design multi-agent environments in which road rules emerge as optimal solutions to the problem of maximizing traffic flow. We analyze what ingredients in driving environments cause the emergence of these road rules and find that two crucial factors are noisy perception and agents spatial density. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors |
| Researcher Affiliation | Collaboration | Avik Pal1, Jonah Philion2,3,4, Yuan-Hong Liao2,4, Sanja Fidler2,3,4 1IIT Kanpur, 2University of Toronto, 3NVIDIA, 4Vector Institute |
| Pseudocode | Yes | Algorithm 1: Alternating Optimization for Spline and Acceleration Control |
| Open Source Code | Yes | Code and Documentation for all experiments presented in this paper can be found in our Project Page1. ... 2https://github.com/fidler-lab/social-driving |
| Open Datasets | Yes | We use multiple environments: four-way intersection, highway tracks, and real-world road patches from nu Scenes (Caesar et al., 2019) |
| Dataset Splits | No | The paper mentions using the 'nu Scenes trainval split' in Section 6.1, which indicates the dataset has predefined splits. However, it does not explicitly specify the exact percentages or sample counts for training, validation, and test splits used in their experiments, nor does it provide details on how these splits were reproduced if custom. |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments or train the models. |
| Software Dependencies | No | The paper mentions using 'Proximal Policy Optimization (PPO)' and 'Generalized Advantage Estimation (GAE)', and that 'Training is performed using a custom adaptation of Spinning Up (Achiam, 2018) for MARL and Horovod (Sergeev and Balso, 2018)'. However, it does not specify version numbers for these software components or other dependencies like Python or deep learning frameworks. |
| Experiment Setup | No | The paper describes the reward function, observation types, and policy parameterizations. It mentions hyperparameters {ϵ, c1, c2} for the PPO objective and parameters like N, K1, K2 for the optimization algorithm. However, the specific numerical values for these hyperparameters (e.g., learning rate, batch size, number of epochs, values for ϵ, c1, c2) are not explicitly stated in the main text. |