Causal Confusion in Imitation Learning
Authors: Pim de Haan, Dinesh Jayaraman, Sergey Levine
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations. |
| Researcher Affiliation | Collaboration | Pim de Haan 1, Dinesh Jayaraman , Sergey Levine Qualcomm AI Research, University of Amsterdam, Berkeley AI Research, Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 Expert query intervention; Algorithm 2 Policy execution intervention |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code related to the methodology described. |
| Open Datasets | Yes | We study three kinds of tasks: (i) Mountain Car (continuous states, discrete actions), (ii) Mu Jo Co Hopper (continuous states and actions), (iii) Atari games: Pong, Enduro and Up NDown (states: two stacked consecutive frames, discrete actions). |
| Dataset Splits | No | All policies are trained to near-zero validation error on held-out expert state-action tuples. The paper mentions using "held-out" data for validation but does not provide specific details on the split percentages or sizes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like neural networks, β-VAE, and Coord Conv, but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | No | In all cases, we use neural networks with identical architectures to represent the policies, and we train them on the same demonstrations. ... φ are neural network parameters, trained through gradient descent to minimize: EG[ℓ(fφ([Xi G, G]), Ai)], (1) where G is drawn uniformly at random over all 2n graphs and ℓis a mean squared error loss for the continuous action environments and a cross-entropy loss for the discrete action environments. The paper describes the model architecture, training objective, and loss functions but does not specify concrete hyperparameter values such as learning rate, batch size, or number of epochs. |