reproducibilityindex.ai

Imitation-Projected Programmatic Reinforcement Learning

Authors: Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can signiﬁcantly outperform state-of-the-art approaches for learning programmatic policies.
Researcher Affiliation	Academia	Abhinav Verma Rice University averma@rice.edu Hoang M. Le Caltech hmle@caltech.edu Yisong Yue Caltech yyue@caltech.edu Swarat Chaudhuri Rice University swarat@rice.edu
Pseudocode	Yes	Algorithm 1 Imitation-Projected Programmatic Reinforcement Learning (PROPEL); Algorithm 2 UPDATEF: neural policy gradient for mixed policies; Algorithm 3 PROJECTΠ: program synthesis via imitation learning
Open Source Code	Yes	The code for the TORCS experiments can be found at: https://bitbucket.org/averma8053/propel
Open Datasets	Yes	We evaluate over ﬁve distinct tracks in the TORCS simulator. Empirical results on two additional classic control tasks, Mountain-Car and Pendulum, are provided in Appendix B
Dataset Splits	No	The paper mentions running experiments with "twenty-ﬁve random seeds" and "training for 600 episodes", but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) as would be typical for static datasets. Since it's a simulation environment, data is generated dynamically.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	We perform the experiments with twenty-ﬁve random seeds and report the median lap time over these twentyﬁve trials. ... DDPG, a neural policy learned using the Deep Deterministic Policy Gradients [36] algorithm, with 600 episodes of training for each track.