Imitation Learning from Observations under Transition Model Disparity
Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the efficacy of our ILO algorithm using five locomotion control tasks from Open AI Gym where we introduce a mismatch between the dynamics of the expert and the learner by changing different configuration parameters. We demonstrate that our approach compares favorably to the baseline ILO algorithms in many of the considered scenarios. |
| Researcher Affiliation | Academia | Anonymous authors Paper under double-blind review. The paper states the authors are anonymous, therefore, no institutional affiliations are provided to classify the type. |
| Pseudocode | Yes | Algorithm 1: AILO (Advisor-augmented Imitation Learning from Observations) |
| Open Source Code | No | The authors pledge to make the source code for reproducing all the experiments of this paper public upon the de-anonymization of the paper. |
| Open Datasets | No | The paper mentions 'continuous-control locomotion environments from Open AI Gym' and 'Mu Jo Co physics simulator' and refers to 'Expert data collection' by training an expert policy and generating rollouts. However, it does not provide a specific link, DOI, repository, or formal citation for accessing a publicly available dataset used for training. It only describes collecting the expert dataset internally. |
| Dataset Splits | No | The paper mentions collecting 'expert data' and 'roll out trajectories τ using π' for training but does not specify any training/validation/test dataset splits, percentages, or absolute sample counts for reproducibility. It also doesn't reference predefined splits with citations. |
| Hardware Specification | No | The paper describes the software environment (Open AI Gym, Mu Jo Co) and details about the models and hyperparameters. However, it does not specify any particular hardware used for running the experiments (e.g., specific GPU models, CPU models, or cloud computing instance types). |
| Software Dependencies | No | Table 2 lists 'Hyperparameters for AILO and the baselines', mentioning components like 'Adam (lr = 1e-4)' and 'PPO clipping 0.2'. While it names optimizers and algorithms, it does not provide specific version numbers for software dependencies or libraries (e.g., TensorFlow 2.x, PyTorch 1.x, gym 0.x.x). |
| Experiment Setup | Yes | Table 2: Hyperparameters for AILO and the baselines, including: Discriminator / Reward network (fω) architecture (3 layers, 128 hidden-dim, tanh), optimizer (Adam, lr = 1e-4), RL agent (policy arch., critic arch., PPO clipping, PPO epochs per iteration, discount factor, GAE factor, entropy target). |