Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
Authors: Jongjin Park, Younggyo Seo, Chang Liu, Li Zhao, Tao Qin, Jinwoo Shin, Tie-Yan Liu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. |
| Researcher Affiliation | Collaboration | Jongjin Park1 Younggyo Seo1 Chang Liu2 Li Zhao2 Tao Qin2 Jinwoo Shin1 Tie-Yan Liu2 1Korea Advanced Institute of Science and Technology 2Microsoft Research Asia |
| Pseudocode | Yes | see Figure 2 and Algorithm 1 for the overview and pseudocode of OREO, respectively. |
| Open Source Code | Yes | Our source code is available at https://github.com/microsoft/causal-imitation-learning. |
| Open Datasets | Yes | For expert demonstrations, we utilize DQN Replay dataset [1]. As this dataset consists of 50M transitions of each environment collected during the training of a DQN agent [32], we use the last N trajectories as expert demonstrations. |
| Dataset Splits | Yes | To see how this works in our setup, we ο¬rst introduce a validation dataset consisting of 5 expert demonstrations on confounded Pong environment... We evaluate the performance of OREO with a varying number of expert demonstrations N {5, 10, 20, 35, 50}. |
| Hardware Specification | Yes | We use a single Nvidia P100 GPU and 8 CPU cores for each training run. |
| Software Dependencies | No | The paper mentions using "Dopamine library [9]" but does not provide specific version numbers for it or other key software components like Python or deep learning frameworks. |
| Experiment Setup | Yes | As for hyperparameter selection, we use the default hyperparameters from previous or similar works [35, 50], i.e., a drop probability of p = 0.5, a codebook size of K = 512, and a commitment cost of Ξ² = 0.25. |