Hierarchical Programmatic Option Framework
Authors: Yu-An Lin, Chen-Tao Lee, Chih-Han Yang, Guan-Ting Liu, Shao-Hua Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programmatic options. |
| Researcher Affiliation | Academia | Yu-An Lin Chen-Tao Lee Chih-Han Yang Guan-Ting Liu Shao-Hua Sun National Taiwan University {b06204039, b06703027, b10902069, f07944014, shaohuas}@ntu.edu.tw |
| Pseudocode | Yes | Algorithm 1 Cross Entropy Method |
| Open Source Code | No | No] Justification: We will release the code as soon as possible. |
| Open Datasets | Yes | To evaluate our proposed HIPO framework, we adopt the Karel domain [56], which characterizes an agent that navigates a grid world and interacts with objects. HIPO outperforms prior programmatic reinforcement learning and deep RL baselines on existing benchmarks [46, 74]. |
| Dataset Splits | Yes | The program dataset used to train qψ and pθ consists of 35,000 programs for training and 7,500 programs for validation and testing. |
| Hardware Specification | Yes | For our experiments, we utilized the following workstation: 20-core Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz, with 2X NVIDIA Ge Force RTX 4070 Ti GPU |
| Software Dependencies | No | The paper mentions algorithms like GRU [15] and PPO [61], and optimizers like Adam, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | Maximum program number: 1000 Batch size : 32 Clipping: 0.05 α: 0.99 γ: 0.99 GAE lambda: 0.95 Value function coefficient: 0.5 Entropy coefficient: 0.1 Number of updates per training iteration: 4 Number of environment steps per set of training iterations: 32 Number of parallel actors: 32 Optimizer : Adam Learning rate: {0.1, 0.01, 0.001, 0.0001, 0.00001} |