Model-Free Imitation Learning with Policy Optimization
Authors: Jonathan Ho, Jayesh Gupta, Stefano Ermon
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015), and a variation of Levine & Koltun s highway driving simulation, in which the agent receives high-dimensional egocentric observation features. |
| Researcher Affiliation | Academia | Jonathan Ho HOJ@CS.STANFORD.EDU Jayesh K. Gupta JKG@CS.STANFORD.EDU Stefano Ermon ERMON@CS.STANFORD.EDU Stanford University |
| Pseudocode | Yes | Algorithm 1 IM-REINFORCE, Algorithm 2 IM-TRPO |
| Open Source Code | No | The paper does not provide any specific statements or links regarding the availability of its source code. |
| Open Datasets | Yes | We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015)... Karpathy, Andrej. Reinforcejs: Waterworld demo, 2015. URL http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits, only describing how expert data was generated and used in experiments (e.g., |
| Hardware Specification | No | The paper mentions 'On our system' but provides no specific hardware details such as CPU or GPU models, or memory specifications, which are necessary for reproducibility. |
| Software Dependencies | No | The paper refers to algorithms and model types (e.g., |
| Experiment Setup | No | The paper states that 'Details on the environments and training methodology are in the supplement' and describes general policy construction (Gaussian action distributions, multi-layer perceptron) but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings within the main text. |