reproducibilityindex.ai

Emergent Road Rules In Multi-Agent Driving Environments

Authors: Avik Pal, Jonah Philion, Yuan-Hong Liao, Sanja Fidler

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we provide empirical evidence that suggests that instead of hard-coding road rules into self-driving algorithms a scalable alternative may be to design multi-agent environments in which road rules emerge as optimal solutions to the problem of maximizing traffic flow. We analyze what ingredients in driving environments cause the emergence of these road rules and find that two crucial factors are noisy perception and agents spatial density. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors
Researcher Affiliation	Collaboration	Avik Pal1, Jonah Philion2,3,4, Yuan-Hong Liao2,4, Sanja Fidler2,3,4 1IIT Kanpur, 2University of Toronto, 3NVIDIA, 4Vector Institute
Pseudocode	Yes	Algorithm 1: Alternating Optimization for Spline and Acceleration Control
Open Source Code	Yes	Code and Documentation for all experiments presented in this paper can be found in our Project Page1. ... 2https://github.com/fidler-lab/social-driving
Open Datasets	Yes	We use multiple environments: four-way intersection, highway tracks, and real-world road patches from nu Scenes (Caesar et al., 2019)
Dataset Splits	No	The paper mentions using the 'nu Scenes trainval split' in Section 6.1, which indicates the dataset has predefined splits. However, it does not explicitly specify the exact percentages or sample counts for training, validation, and test splits used in their experiments, nor does it provide details on how these splits were reproduced if custom.
Hardware Specification	No	The paper does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments or train the models.
Software Dependencies	No	The paper mentions using 'Proximal Policy Optimization (PPO)' and 'Generalized Advantage Estimation (GAE)', and that 'Training is performed using a custom adaptation of Spinning Up (Achiam, 2018) for MARL and Horovod (Sergeev and Balso, 2018)'. However, it does not specify version numbers for these software components or other dependencies like Python or deep learning frameworks.
Experiment Setup	No	The paper describes the reward function, observation types, and policy parameterizations. It mentions hyperparameters {ϵ, c1, c2} for the PPO objective and parameters like N, K1, K2 for the optimization algorithm. However, the specific numerical values for these hyperparameters (e.g., learning rate, batch size, number of epochs, values for ϵ, c1, c2) are not explicitly stated in the main text.