reproducibilityindex.ai

Imitation Learning from Observations under Transition Model Disparity

Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the efﬁcacy of our ILO algorithm using ﬁve locomotion control tasks from Open AI Gym where we introduce a mismatch between the dynamics of the expert and the learner by changing different conﬁguration parameters. We demonstrate that our approach compares favorably to the baseline ILO algorithms in many of the considered scenarios.
Researcher Affiliation	Academia	Anonymous authors Paper under double-blind review. The paper states the authors are anonymous, therefore, no institutional affiliations are provided to classify the type.
Pseudocode	Yes	Algorithm 1: AILO (Advisor-augmented Imitation Learning from Observations)
Open Source Code	No	The authors pledge to make the source code for reproducing all the experiments of this paper public upon the de-anonymization of the paper.
Open Datasets	No	The paper mentions 'continuous-control locomotion environments from Open AI Gym' and 'Mu Jo Co physics simulator' and refers to 'Expert data collection' by training an expert policy and generating rollouts. However, it does not provide a specific link, DOI, repository, or formal citation for accessing a publicly available dataset used for training. It only describes collecting the expert dataset internally.
Dataset Splits	No	The paper mentions collecting 'expert data' and 'roll out trajectories τ using π' for training but does not specify any training/validation/test dataset splits, percentages, or absolute sample counts for reproducibility. It also doesn't reference predefined splits with citations.
Hardware Specification	No	The paper describes the software environment (Open AI Gym, Mu Jo Co) and details about the models and hyperparameters. However, it does not specify any particular hardware used for running the experiments (e.g., specific GPU models, CPU models, or cloud computing instance types).
Software Dependencies	No	Table 2 lists 'Hyperparameters for AILO and the baselines', mentioning components like 'Adam (lr = 1e-4)' and 'PPO clipping 0.2'. While it names optimizers and algorithms, it does not provide specific version numbers for software dependencies or libraries (e.g., TensorFlow 2.x, PyTorch 1.x, gym 0.x.x).
Experiment Setup	Yes	Table 2: Hyperparameters for AILO and the baselines, including: Discriminator / Reward network (fω) architecture (3 layers, 128 hidden-dim, tanh), optimizer (Adam, lr = 1e-4), RL agent (policy arch., critic arch., PPO clipping, PPO epochs per iteration, discount factor, GAE factor, entropy target).