Causal Confusion in Imitation Learning

Authors: Pim de Haan, Dinesh Jayaraman, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
Researcher Affiliation Collaboration Pim de Haan 1, Dinesh Jayaraman , Sergey Levine Qualcomm AI Research, University of Amsterdam, Berkeley AI Research, Facebook AI Research
Pseudocode Yes Algorithm 1 Expert query intervention; Algorithm 2 Policy execution intervention
Open Source Code No The paper does not provide an explicit statement or link for open-source code related to the methodology described.
Open Datasets Yes We study three kinds of tasks: (i) Mountain Car (continuous states, discrete actions), (ii) Mu Jo Co Hopper (continuous states and actions), (iii) Atari games: Pong, Enduro and Up NDown (states: two stacked consecutive frames, discrete actions).
Dataset Splits No All policies are trained to near-zero validation error on held-out expert state-action tuples. The paper mentions using "held-out" data for validation but does not provide specific details on the split percentages or sizes.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions software components like neural networks, β-VAE, and Coord Conv, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup No In all cases, we use neural networks with identical architectures to represent the policies, and we train them on the same demonstrations. ... φ are neural network parameters, trained through gradient descent to minimize: EG[ℓ(fφ([Xi G, G]), Ai)], (1) where G is drawn uniformly at random over all 2n graphs and ℓis a mean squared error loss for the continuous action environments and a cross-entropy loss for the discrete action environments. The paper describes the model architecture, training objective, and loss functions but does not specify concrete hyperparameter values such as learning rate, batch size, or number of epochs.