reproducibilityindex.ai

Causal Imitation Learning under Temporally Correlated Noise

Authors: Gokul Swamy, Sanjiban Choudhury, Drew Bagnell, Steven Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnd both of our algorithms compare favorably to behavioral cloning on simulated control tasks. We then validate their performance on simulated control tasks. We also empirically investigate how the persistence of the confounder impacts policy performance. We test Doub IL and Residu IL on a slightly modiﬁed version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019).
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Cornell University 3Aurora Innovation.
Pseudocode	Yes	Algorithm 1 Doub IL Algorithm 2 Residu IL
Open Source Code	Yes	We release our code at https: //github.com/gkswamy98/causal_il.
Open Datasets	Yes	We test Doub IL and Residu IL on a slightly modiﬁed version of the Open AI Gym (Brockman et al., 2016) Lunar Lander-v2 environment against a behavioral cloning baseline. We next consider the Half Cheetah Bullet Env and Ant Bullet Env environments (Coumans & Bai, 2016 2019).
Dataset Splits	No	The paper mentions training data and test-time evaluation, but it does not explicitly specify validation dataset splits or how data was partitioned into train/validation/test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like "Open AI Gym", "PPO", "SAC", "Adam optimizer", and "Pybullet" with citations, but it does not specify exact version numbers for these software dependencies, which are required for reproducible descriptions.
Experiment Setup	Yes	For all learned functions, we use two-layer Re Lu MLPs with 256 hidden units. We use the Adam optimizer (Kingma & Ba, 2014) for behavioral cloning and Doub IL and use the optimistic variant for Residu IL. We apply a weight decay of 1e-3 to all. We train all methods for 50k steps. Tables 2, 3, 4, and 5 provide specific values for 'LEARNING RATE', 'BATCH SIZE', 'NUM. SAMPLES FOR E', 'BC REGULARIZER WEIGHT', 'f NORM PENALTY', and 'ADAM βS'.