Imitating Latent Policies from Observation

Authors: Ashley Edwards, Himanshu Sahni, Yannick Schroecker, Charles Isbell

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github. com/ashedwards/ILPO. We evaluate our approach in four environments: classic control with cartpole, acrobot, and mountain car, and a recent platform game by Open AI, Coin Run (Cobbe et al., 2018). We show that our approach is able to perform as well as the expert after just a few steps of interacting with the environment, and performs better than a recent approach for imitating from observations, Behavioral Cloning from Observation (Torabi et al., 2018a). 4. Experiments and results In this section, we discuss the experiments used to evaluate ILPO.
Researcher Affiliation Academia 1Georgia Institute of Technology, Atlanta, GA, USA. Correspondence to: Ashley D. Edwards <aedwards8@gatech.edu>.
Pseudocode Yes Algorithm 1 Imitating Latent Policies from Observation
Open Source Code Yes Code for this work is available at https://github. com/ashedwards/ILPO.
Open Datasets Yes We used Open AI Baselines (Dhariwal et al., 2017) to obtain expert policies and generate demonstrations for each environment. We used 50,000 expert state observations to train ILPO and BCO, and the corresponding actions to train Behavioral Cloning (BC). Coin Run (Cobbe et al., 2018).
Dataset Splits No The paper mentions using 50,000 expert state observations for training but does not provide specific details on how these observations are split into training, validation, or test sets (e.g., percentages, sample counts, or explicit references to standard splits for reproduction).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No We used Open AI Baselines (Dhariwal et al., 2017) to obtain expert policies and generate demonstrations for each environment. The paper mentions
Experiment Setup No We used the same network structure and hyperparameters across both domains, as described in the appendix. The paper states that hyperparameters are described in the appendix but does not provide specific values or detailed training configurations (e.g., learning rates, batch sizes, number of epochs) in the main text itself.