Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
Authors: Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. |
| Researcher Affiliation | Collaboration | Jongjin Park1 Younggyo Seo1 Chang Liu2 Li Zhao2 Tao Qin2 Jinwoo Shin1 Tie-Yan Liu2 1Korea Advanced Institute of Science and Technology 2Microsoft Research Asia |
| Pseudocode | Yes | see Figure 2 and Algorithm 1 for the overview and pseudocode of OREO, respectively. |
| Open Source Code | Yes | Our source code is available at https://github.com/microsoft/causal-imitation-learning. |
| Open Datasets | Yes | For expert demonstrations, we utilize DQN Replay dataset [1]. As this dataset consists of 50M transitions of each environment collected during the training of a DQN agent [32], we use the last N trajectories as expert demonstrations. |
| Dataset Splits | Yes | To see how this works in our setup, we first introduce a validation dataset consisting of 5 expert demonstrations on confounded Pong environment... We evaluate the performance of OREO with a varying number of expert demonstrations N {5, 10, 20, 35, 50}. |
| Hardware Specification | Yes | We use a single Nvidia P100 GPU and 8 CPU cores for each training run. |
| Software Dependencies | No | The paper mentions using "Dopamine library [9]" but does not provide specific version numbers for it or other key software components like Python or deep learning frameworks. |
| Experiment Setup | Yes | As for hyperparameter selection, we use the default hyperparameters from previous or similar works [35, 50], i.e., a drop probability of p = 0.5, a codebook size of K = 512, and a commitment cost of β = 0.25. |