Causal Imitation Learning under Temporally Correlated Noise

Authors: Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks. We then validate their performance on simulated control tasks. We also empirically investigate how the persistence of the confounder impacts policy performance. We test Doub IL and Residu IL on a slightly modified version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019).
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Cornell University 3Aurora Innovation.
Pseudocode Yes Algorithm 1 Doub IL Algorithm 2 Residu IL
Open Source Code Yes We release our code at https: //github.com/gkswamy98/causal_il.
Open Datasets Yes We test Doub IL and Residu IL on a slightly modified version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019).
Dataset Splits No The paper mentions training data and test-time evaluation, but it does not explicitly specify validation dataset splits or how data was partitioned into train/validation/test sets.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like "Open AI Gym", "PPO", "SAC", "Adam optimizer", and "Pybullet" with citations, but it does not specify exact version numbers for these software dependencies, which are required for reproducible descriptions.
Experiment Setup Yes For all learned functions, we use two-layer Re Lu MLPs with 256 hidden units. We use the Adam optimizer (Kingma & Ba, 2014) for behavioral cloning and Doub IL and use the optimistic variant for Residu IL. We apply a weight decay of 1e-3 to all. We train all methods for 50k steps. Tables 2, 3, 4, and 5 provide specific values for 'LEARNING RATE', 'BATCH SIZE', 'NUM. SAMPLES FOR E', 'BC REGULARIZER WEIGHT', 'f NORM PENALTY', and 'ADAM βS'.