Hierarchical Programmatic Option Framework

Authors: Yu-An Lin, Chen-Tao Lee, Chih-Han Yang, Guan-Ting Liu, Shao-Hua Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programmatic options.
Researcher Affiliation Academia Yu-An Lin Chen-Tao Lee Chih-Han Yang Guan-Ting Liu Shao-Hua Sun National Taiwan University {b06204039, b06703027, b10902069, f07944014, shaohuas}@ntu.edu.tw
Pseudocode Yes Algorithm 1 Cross Entropy Method
Open Source Code No No] Justification: We will release the code as soon as possible.
Open Datasets Yes To evaluate our proposed HIPO framework, we adopt the Karel domain [56], which characterizes an agent that navigates a grid world and interacts with objects. HIPO outperforms prior programmatic reinforcement learning and deep RL baselines on existing benchmarks [46, 74].
Dataset Splits Yes The program dataset used to train qψ and pθ consists of 35,000 programs for training and 7,500 programs for validation and testing.
Hardware Specification Yes For our experiments, we utilized the following workstation: 20-core Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz, with 2X NVIDIA Ge Force RTX 4070 Ti GPU
Software Dependencies No The paper mentions algorithms like GRU [15] and PPO [61], and optimizers like Adam, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes Maximum program number: 1000 Batch size : 32 Clipping: 0.05 α: 0.99 γ: 0.99 GAE lambda: 0.95 Value function coefficient: 0.5 Entropy coefficient: 0.1 Number of updates per training iteration: 4 Number of environment steps per set of training iterations: 32 Number of parallel actors: 32 Optimizer : Adam Learning rate: {0.1, 0.01, 0.001, 0.0001, 0.00001}