Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Imitation-Projected Programmatic Reinforcement Learning
Authors: Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies. |
| Researcher Affiliation | Academia | Abhinav Verma Rice University EMAIL Hoang M. Le Caltech EMAIL Yisong Yue Caltech EMAIL Swarat Chaudhuri Rice University EMAIL |
| Pseudocode | Yes | Algorithm 1 Imitation-Projected Programmatic Reinforcement Learning (PROPEL); Algorithm 2 UPDATEF: neural policy gradient for mixed policies; Algorithm 3 PROJECTΠ: program synthesis via imitation learning |
| Open Source Code | Yes | The code for the TORCS experiments can be found at: https://bitbucket.org/averma8053/propel |
| Open Datasets | Yes | We evaluate over five distinct tracks in the TORCS simulator. Empirical results on two additional classic control tasks, Mountain-Car and Pendulum, are provided in Appendix B |
| Dataset Splits | No | The paper mentions running experiments with "twenty-five random seeds" and "training for 600 episodes", but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) as would be typical for static datasets. Since it's a simulation environment, data is generated dynamically. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We perform the experiments with twenty-five random seeds and report the median lap time over these twentyfive trials. ... DDPG, a neural policy learned using the Deep Deterministic Policy Gradients [36] algorithm, with 600 episodes of training for each track. |