reproducibilityindex.ai

Imitating Latent Policies from Observation

Authors: Ashley Edwards, Himanshu Sahni, Yannick Schroecker, Charles Isbell

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github. com/ashedwards/ILPO. We evaluate our approach in four environments: classic control with cartpole, acrobot, and mountain car, and a recent platform game by Open AI, Coin Run (Cobbe et al., 2018). We show that our approach is able to perform as well as the expert after just a few steps of interacting with the environment, and performs better than a recent approach for imitating from observations, Behavioral Cloning from Observation (Torabi et al., 2018a). 4. Experiments and results In this section, we discuss the experiments used to evaluate ILPO.
Researcher Affiliation	Academia	1Georgia Institute of Technology, Atlanta, GA, USA. Correspondence to: Ashley D. Edwards <aedwards8@gatech.edu>.
Pseudocode	Yes	Algorithm 1 Imitating Latent Policies from Observation
Open Source Code	Yes	Code for this work is available at https://github. com/ashedwards/ILPO.
Open Datasets	Yes	We used Open AI Baselines (Dhariwal et al., 2017) to obtain expert policies and generate demonstrations for each environment. We used 50,000 expert state observations to train ILPO and BCO, and the corresponding actions to train Behavioral Cloning (BC). Coin Run (Cobbe et al., 2018).
Dataset Splits	No	The paper mentions using 50,000 expert state observations for training but does not provide specific details on how these observations are split into training, validation, or test sets (e.g., percentages, sample counts, or explicit references to standard splits for reproduction).
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	We used Open AI Baselines (Dhariwal et al., 2017) to obtain expert policies and generate demonstrations for each environment. The paper mentions
Experiment Setup	No	We used the same network structure and hyperparameters across both domains, as described in the appendix. The paper states that hyperparameters are described in the appendix but does not provide specific values or detailed training configurations (e.g., learning rates, batch sizes, number of epochs) in the main text itself.