Causal Imitation Learning under Temporally Correlated Noise
Authors: Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks. We then validate their performance on simulated control tasks. We also empirically investigate how the persistence of the confounder impacts policy performance. We test Doub IL and Residu IL on a slightly modified version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019). |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Cornell University 3Aurora Innovation. |
| Pseudocode | Yes | Algorithm 1 Doub IL Algorithm 2 Residu IL |
| Open Source Code | Yes | We release our code at https: //github.com/gkswamy98/causal_il. |
| Open Datasets | Yes | We test Doub IL and Residu IL on a slightly modified version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019). |
| Dataset Splits | No | The paper mentions training data and test-time evaluation, but it does not explicitly specify validation dataset splits or how data was partitioned into train/validation/test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like "Open AI Gym", "PPO", "SAC", "Adam optimizer", and "Pybullet" with citations, but it does not specify exact version numbers for these software dependencies, which are required for reproducible descriptions. |
| Experiment Setup | Yes | For all learned functions, we use two-layer Re Lu MLPs with 256 hidden units. We use the Adam optimizer (Kingma & Ba, 2014) for behavioral cloning and Doub IL and use the optimistic variant for Residu IL. We apply a weight decay of 1e-3 to all. We train all methods for 50k steps. Tables 2, 3, 4, and 5 provide specific values for 'LEARNING RATE', 'BATCH SIZE', 'NUM. SAMPLES FOR E', 'BC REGULARIZER WEIGHT', 'f NORM PENALTY', and 'ADAM βS'. |