Model-Free Imitation Learning with Policy Optimization

Authors: Jonathan Ho, Jayesh Gupta, Stefano Ermon

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015), and a variation of Levine & Koltun s highway driving simulation, in which the agent receives high-dimensional egocentric observation features.
Researcher Affiliation Academia Jonathan Ho HOJ@CS.STANFORD.EDU Jayesh K. Gupta JKG@CS.STANFORD.EDU Stefano Ermon ERMON@CS.STANFORD.EDU Stanford University
Pseudocode Yes Algorithm 1 IM-REINFORCE, Algorithm 2 IM-TRPO
Open Source Code No The paper does not provide any specific statements or links regarding the availability of its source code.
Open Datasets Yes We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015)... Karpathy, Andrej. Reinforcejs: Waterworld demo, 2015. URL http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits, only describing how expert data was generated and used in experiments (e.g.,
Hardware Specification No The paper mentions 'On our system' but provides no specific hardware details such as CPU or GPU models, or memory specifications, which are necessary for reproducibility.
Software Dependencies No The paper refers to algorithms and model types (e.g.,
Experiment Setup No The paper states that 'Details on the environments and training methodology are in the supplement' and describes general policy construction (Gaussian action distributions, multi-layer perceptron) but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings within the main text.