Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Model-Free Imitation Learning with Policy Optimization
Authors: Jonathan Ho, Jayesh Gupta, Stefano Ermon
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015), and a variation of Levine & Koltun s highway driving simulation, in which the agent receives high-dimensional egocentric observation features. |
| Researcher Affiliation | Academia | Jonathan Ho EMAIL Jayesh K. Gupta EMAIL Stefano Ermon EMAIL Stanford University |
| Pseudocode | Yes | Algorithm 1 IM-REINFORCE, Algorithm 2 IM-TRPO |
| Open Source Code | No | The paper does not provide any specific statements or links regarding the availability of its source code. |
| Open Datasets | Yes | We evaluated our approach in a variety of scenarios: finite gridworlds of varying sizes, the continuous planar navigation task of Levine and Koltun (2012), a family of continuous environments of varying numbers of observation features (Karpathy, 2015)... Karpathy, Andrej. Reinforcejs: Waterworld demo, 2015. URL http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits, only describing how expert data was generated and used in experiments (e.g., |
| Hardware Specification | No | The paper mentions 'On our system' but provides no specific hardware details such as CPU or GPU models, or memory specifications, which are necessary for reproducibility. |
| Software Dependencies | No | The paper refers to algorithms and model types (e.g., |
| Experiment Setup | No | The paper states that 'Details on the environments and training methodology are in the supplement' and describes general policy construction (Gaussian action distributions, multi-layer perceptron) but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings within the main text. |