Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning

Authors: Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment.
Researcher Affiliation Collaboration Jongjin Park1 Younggyo Seo1 Chang Liu2 Li Zhao2 Tao Qin2 Jinwoo Shin1 Tie-Yan Liu2 1Korea Advanced Institute of Science and Technology 2Microsoft Research Asia
Pseudocode Yes see Figure 2 and Algorithm 1 for the overview and pseudocode of OREO, respectively.
Open Source Code Yes Our source code is available at https://github.com/microsoft/causal-imitation-learning.
Open Datasets Yes For expert demonstrations, we utilize DQN Replay dataset [1]. As this dataset consists of 50M transitions of each environment collected during the training of a DQN agent [32], we use the last N trajectories as expert demonstrations.
Dataset Splits Yes To see how this works in our setup, we first introduce a validation dataset consisting of 5 expert demonstrations on confounded Pong environment... We evaluate the performance of OREO with a varying number of expert demonstrations N {5, 10, 20, 35, 50}.
Hardware Specification Yes We use a single Nvidia P100 GPU and 8 CPU cores for each training run.
Software Dependencies No The paper mentions using "Dopamine library [9]" but does not provide specific version numbers for it or other key software components like Python or deep learning frameworks.
Experiment Setup Yes As for hyperparameter selection, we use the default hyperparameters from previous or similar works [35, 50], i.e., a drop probability of p = 0.5, a codebook size of K = 512, and a commitment cost of β = 0.25.