Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Authors: Mikael Henaff, Alfredo Canziani, Yann LeCun

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach using a large-scale observational dataset of driving behavior recorded from traffic cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.
Researcher Affiliation Collaboration Mikael Henaff Courant Institute, New York University Microsoft Research, NYC mbh305@nyu.edu Alfredo Canziani Courant Institute, New York University canziani@nyu.edu Yann Le Cun Courant Institute, New York University Facebook AI Research yann@cs.nyu.edu
Pseudocode No The paper describes algorithms and training steps in prose and uses diagrams (Figure 2, 3, 10), but no formal pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Code and additional video results for the model predictions and learned policies can be found at the following URL: https://sites.google.com/view/model-predictive-driving/home.
Open Datasets Yes The Next Generation Simulation program s Interstate 80 (NGSIM I-80) dataset (Halkias & Colyar, 2006) consists of 45 minutes of recordings from traffic cameras mounted over a stretch of highway.
Dataset Splits Yes This yields a total 5596 car trajectories, which we split into training (80%), validation (10%) and testing sets (10%).
Hardware Specification No No specific hardware details (e.g., CPU, GPU model numbers, memory) were found in the paper.
Software Dependencies No The paper mentions 'Open AI Gym (Brockman et al., 2016)', 'Adam (Kingma & Ba, 2014)', 'Proximal Policy Optimization (PPO) (Schulman et al., 2017)', and 'Open AI Baselines'. It does not provide specific version numbers for any of these, nor for Python or PyTorch.
Experiment Setup Yes Our model was trained using Adam (Kingma & Ba, 2014) with learning rate 0.0001 and minibatches of size 64, unrolled for 20 time steps, and with dropout (pdropout = 0.1) at every layer, which was necessary for computing the epistemic uncertainty cost when training the policy network.