Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design and conduct experiments to compare our proposed framework (HPRL) to its variants and baselines. [...] The experimental results on Table 1 show that HPRL-PPO outperforms all other approaches on all tasks.
Researcher Affiliation Academia 1National Taiwan University, Taipei, Taiwan. Correspondence to: Shao-Hua Sun <shaohuas@ntu.edu.tw>.
Pseudocode Yes Algorithm 1 HPRL: Learning Latent Program Embedding Space [...] Algorithm 2 HPRL: Meta-Policy Training
Open Source Code No Project page: https://nturobotlearninglab.github.io/hprl
Open Datasets No The Karel program dataset used in this work includes one million programs. All the programs are generated based on syntax rules of the Karel DSL with a maximum length of 40 program tokens.
Dataset Splits Yes The Karel program dataset used in this work includes 1 million program sequences, with 85% as the training dataset and 15% as the evaluation dataset.
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) were provided for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'PyTorch 1.9') were provided.
Experiment Setup Yes Table 9. Hyperparameters of VAE Pretraining [...] Table 10. Hyperparameters of HPRL-PPO and HPRL-SAC Training