Causal Imitation Learning via Inverse Reinforcement Learning
Authors: Kangrui Ruan, Junzhe Zhang, Xuan Di, Elias Bareinboim
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate our framework on various imitation learning tasks, ranging from synthetic causal models to real-world datasets, including highway driving (Krajewski et al., 2018) and images (Le Cun, 1998). We find that our approach is able to incorporate parametric knowledge about the reward function and achieve effective imitating policies across different causal diagrams. For all experiments, we evaluate our proposed Causal-IRL based on the canonical equation formulation in Eq. (3). As a baseline, we also include: (1) standard BC mimicking the expert s nominal behavior policy; (2) standard IRL utilizing all observed covariates preceding every Xi X while being blind to causal relationships in the underlying model; and (3) Causal-BC (Zhang et al., 2020; Kumor et al., 2021) that learn an imitating policy with the sequential π-backdoor criterion. We refer readers to (Ruan et al., 2023, Appendix D) for additional experiments and more discussions on the experimental setup. ... Simulation results, shown in Fig. 3a, reveal that Causal-IRL consistently outperforms the expert s policy and other imitation strategies by exploiting additional parametric knowledge about the expected reward E[Y | X1, X2, Z2]; Causal-BC is able to achieve the expert s performance. |
| Researcher Affiliation | Academia | Kangrui Ruan , Junzhe Zhang , Xuan Di, and Elias Bareinboim Columbia University, New York, NY 10027, USA {kr2910,junzhez,sharon.di,eliasb}@columbia.edu |
| Pseudocode | Yes | We refer readers to Algs. 3 and 4 in (Ruan et al., 2023, Appendix C) for more discussions on the pseudo-code and implementation details. |
| Open Source Code | Yes | Source codes for all experiments and simulations are released in the complete technical report (Ruan et al., 2023). |
| Open Datasets | Yes | We refer readers to (Ruan et al., 2023, Appendix D) for additional experiments and more discussions on the experimental setup. ... We provided references to all existing datasets used in experiments, including HIGHD (Krajewski et al., 2018) and MNIST (Le Cun, 1998). |
| Dataset Splits | No | The paper mentions using datasets for experiments but does not provide specific details on how these datasets are split into training, validation, and test sets. It references Appendix D for experimental setup details, but those details are not in the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. It only mentions 'simulations'. |
| Software Dependencies | No | The paper refers to using MWAL and GAIL algorithms but does not specify any software names with version numbers for implementation, programming languages, or libraries. |
| Experiment Setup | No | The paper mentions experimental setup and refers to Appendix D for more discussions on it. However, the provided text does not contain specific hyperparameter values or detailed training configurations. |