reproducibilityindex.ai

Causal Confusion in Imitation Learning

Authors: Pim de Haan, Dinesh Jayaraman, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that causal misidentiﬁcation occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
Researcher Affiliation	Collaboration	Pim de Haan 1, Dinesh Jayaraman , Sergey Levine Qualcomm AI Research, University of Amsterdam, Berkeley AI Research, Facebook AI Research
Pseudocode	Yes	Algorithm 1 Expert query intervention; Algorithm 2 Policy execution intervention
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code related to the methodology described.
Open Datasets	Yes	We study three kinds of tasks: (i) Mountain Car (continuous states, discrete actions), (ii) Mu Jo Co Hopper (continuous states and actions), (iii) Atari games: Pong, Enduro and Up NDown (states: two stacked consecutive frames, discrete actions).
Dataset Splits	No	All policies are trained to near-zero validation error on held-out expert state-action tuples. The paper mentions using "held-out" data for validation but does not provide specific details on the split percentages or sizes.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions software components like neural networks, β-VAE, and Coord Conv, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	No	In all cases, we use neural networks with identical architectures to represent the policies, and we train them on the same demonstrations. ... φ are neural network parameters, trained through gradient descent to minimize: EG[ℓ(fφ([Xi G, G]), Ai)], (1) where G is drawn uniformly at random over all 2n graphs and ℓis a mean squared error loss for the continuous action environments and a cross-entropy loss for the discrete action environments. The paper describes the model architecture, training objective, and loss functions but does not specify concrete hyperparameter values such as learning rate, batch size, or number of epochs.