reproducibilityindex.ai

Deep Inverse Q-learning with Constraints

Authors: Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the resulting algorithms called Inverse Action-value Iteration, Inverse Q-learning and Deep Inverse Qlearning on the Objectworld benchmark, showing a speedup of up to several orders of magnitude compared to (Deep) Max-Entropy algorithms. We further apply Deep Constrained Inverse Q-learning on the task of learning autonomous lane-changes in the open-source simulator SUMO achieving competent driving after training on data corresponding to 30 minutes of demonstrations.
Researcher Affiliation	Collaboration	Gabriel Kalweit Neurorobotics Lab University of Freiburg kalweitg@cs.uni-freiburg.de Maria Huegle Neurorobotics Lab University of Freiburg hueglem@cs.uni-freiburg.de Moritz Werling BMWGroup Germany Moritz.Werling@bmw.de Joschka Boedecker Neurorobotics Lab and Brain Links-Brain Tools University of Freiburg jboedeck@cs.uni-freiburg.de
Pseudocode	Yes	Algorithm 1: Tabular Inverse Q-learning; Algorithm 2: Fixed Batch Deep Inverse Q-learning
Open Source Code	No	The paper does not provide concrete access to its own source code (e.g., a specific repository link or an explicit code release statement) for the methodology described.
Open Datasets	Yes	We evaluate the performance of IAVI, IQL and DIQL on the common IRL Objectworld benchmark (Figure 2a) and compare to Max Ent IRL [26]... The Objectworld environment [16] is an N N map, where an agent chooses between going up, down, left or right or to stay in place per time step.
Dataset Splits	No	The paper mentions training runs and evaluation scenarios, but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or exact counts) to allow for reproducible data partitioning.
Hardware Specification	Yes	Resulting expected value difference and time needed until convergence, mean and standard deviation over 5 training runs on a 3.00 GHz CPU (bottom).
Software Dependencies	No	The paper mentions using the "open-source trafﬁc simulator SUMO [14]" but does not provide specific version numbers for SUMO or any other key software libraries or dependencies used in their implementation.
Experiment Setup	Yes	Architectures and hyperparameters are shown in the appendix. We train on highway scenarios with a 1000 m three-lanes highway and random numbers of vehicles and driver types. We trained DCIQL on 5 * 10^4 samples of the expert for 10^5 iterations.