reproducibilityindex.ai

Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

Authors: Mikael Henaff, Alfredo Canziani, Yann LeCun

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach using a large-scale observational dataset of driving behavior recorded from trafﬁc cameras, and show that we are able to learn effective driving policies from purely observational data, with no environment interaction.
Researcher Affiliation	Collaboration	Mikael Henaff Courant Institute, New York University Microsoft Research, NYC mbh305@nyu.edu Alfredo Canziani Courant Institute, New York University canziani@nyu.edu Yann Le Cun Courant Institute, New York University Facebook AI Research yann@cs.nyu.edu
Pseudocode	No	The paper describes algorithms and training steps in prose and uses diagrams (Figure 2, 3, 10), but no formal pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Code and additional video results for the model predictions and learned policies can be found at the following URL: https://sites.google.com/view/model-predictive-driving/home.
Open Datasets	Yes	The Next Generation Simulation program s Interstate 80 (NGSIM I-80) dataset (Halkias & Colyar, 2006) consists of 45 minutes of recordings from trafﬁc cameras mounted over a stretch of highway.
Dataset Splits	Yes	This yields a total 5596 car trajectories, which we split into training (80%), validation (10%) and testing sets (10%).
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU model numbers, memory) were found in the paper.
Software Dependencies	No	The paper mentions 'Open AI Gym (Brockman et al., 2016)', 'Adam (Kingma & Ba, 2014)', 'Proximal Policy Optimization (PPO) (Schulman et al., 2017)', and 'Open AI Baselines'. It does not provide specific version numbers for any of these, nor for Python or PyTorch.
Experiment Setup	Yes	Our model was trained using Adam (Kingma & Ba, 2014) with learning rate 0.0001 and minibatches of size 64, unrolled for 20 time steps, and with dropout (pdropout = 0.1) at every layer, which was necessary for computing the epistemic uncertainty cost when training the policy network.