reproducibilityindex.ai

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design and conduct experiments to compare our proposed framework (HPRL) to its variants and baselines. [...] The experimental results on Table 1 show that HPRL-PPO outperforms all other approaches on all tasks.
Researcher Affiliation	Academia	1National Taiwan University, Taipei, Taiwan. Correspondence to: Shao-Hua Sun <shaohuas@ntu.edu.tw>.
Pseudocode	Yes	Algorithm 1 HPRL: Learning Latent Program Embedding Space [...] Algorithm 2 HPRL: Meta-Policy Training
Open Source Code	No	Project page: https://nturobotlearninglab.github.io/hprl
Open Datasets	No	The Karel program dataset used in this work includes one million programs. All the programs are generated based on syntax rules of the Karel DSL with a maximum length of 40 program tokens.
Dataset Splits	Yes	The Karel program dataset used in this work includes 1 million program sequences, with 85% as the training dataset and 15% as the evaluation dataset.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) were provided for running experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'PyTorch 1.9') were provided.
Experiment Setup	Yes	Table 9. Hyperparameters of VAE Pretraining [...] Table 10. Hyperparameters of HPRL-PPO and HPRL-SAC Training