Imitation Learning from Observations under Transition Model Disparity

Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the efficacy of our ILO algorithm using five locomotion control tasks from Open AI Gym where we introduce a mismatch between the dynamics of the expert and the learner by changing different configuration parameters. We demonstrate that our approach compares favorably to the baseline ILO algorithms in many of the considered scenarios.
Researcher Affiliation Academia Anonymous authors Paper under double-blind review. The paper states the authors are anonymous, therefore, no institutional affiliations are provided to classify the type.
Pseudocode Yes Algorithm 1: AILO (Advisor-augmented Imitation Learning from Observations)
Open Source Code No The authors pledge to make the source code for reproducing all the experiments of this paper public upon the de-anonymization of the paper.
Open Datasets No The paper mentions 'continuous-control locomotion environments from Open AI Gym' and 'Mu Jo Co physics simulator' and refers to 'Expert data collection' by training an expert policy and generating rollouts. However, it does not provide a specific link, DOI, repository, or formal citation for accessing a publicly available dataset used for training. It only describes collecting the expert dataset internally.
Dataset Splits No The paper mentions collecting 'expert data' and 'roll out trajectories τ using π' for training but does not specify any training/validation/test dataset splits, percentages, or absolute sample counts for reproducibility. It also doesn't reference predefined splits with citations.
Hardware Specification No The paper describes the software environment (Open AI Gym, Mu Jo Co) and details about the models and hyperparameters. However, it does not specify any particular hardware used for running the experiments (e.g., specific GPU models, CPU models, or cloud computing instance types).
Software Dependencies No Table 2 lists 'Hyperparameters for AILO and the baselines', mentioning components like 'Adam (lr = 1e-4)' and 'PPO clipping 0.2'. While it names optimizers and algorithms, it does not provide specific version numbers for software dependencies or libraries (e.g., TensorFlow 2.x, PyTorch 1.x, gym 0.x.x).
Experiment Setup Yes Table 2: Hyperparameters for AILO and the baselines, including: Discriminator / Reward network (fω) architecture (3 layers, 128 hidden-dim, tanh), optimizer (Adam, lr = 1e-4), RL agent (policy arch., critic arch., PPO clipping, PPO epochs per iteration, discount factor, GAE factor, entropy target).