Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design and conduct experiments to compare our proposed framework (HPRL) to its variants and baselines. [...] The experimental results on Table 1 show that HPRL-PPO outperforms all other approaches on all tasks. |
| Researcher Affiliation | Academia | 1National Taiwan University, Taipei, Taiwan. Correspondence to: Shao-Hua Sun <shaohuas@ntu.edu.tw>. |
| Pseudocode | Yes | Algorithm 1 HPRL: Learning Latent Program Embedding Space [...] Algorithm 2 HPRL: Meta-Policy Training |
| Open Source Code | No | Project page: https://nturobotlearninglab.github.io/hprl |
| Open Datasets | No | The Karel program dataset used in this work includes one million programs. All the programs are generated based on syntax rules of the Karel DSL with a maximum length of 40 program tokens. |
| Dataset Splits | Yes | The Karel program dataset used in this work includes 1 million program sequences, with 85% as the training dataset and 15% as the evaluation dataset. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, etc.) were provided for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'PyTorch 1.9') were provided. |
| Experiment Setup | Yes | Table 9. Hyperparameters of VAE Pretraining [...] Table 10. Hyperparameters of HPRL-PPO and HPRL-SAC Training |