reproducibilityindex.ai

Hierarchical Programmatic Option Framework

Authors: Yu-An Lin, Chen-Tao Lee, Chih-Han Yang, Guan-Ting Liu, Shao-Hua Sun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programmatic options.
Researcher Affiliation	Academia	Yu-An Lin Chen-Tao Lee Chih-Han Yang Guan-Ting Liu Shao-Hua Sun National Taiwan University {b06204039, b06703027, b10902069, f07944014, shaohuas}@ntu.edu.tw
Pseudocode	Yes	Algorithm 1 Cross Entropy Method
Open Source Code	No	No] Justification: We will release the code as soon as possible.
Open Datasets	Yes	To evaluate our proposed HIPO framework, we adopt the Karel domain [56], which characterizes an agent that navigates a grid world and interacts with objects. HIPO outperforms prior programmatic reinforcement learning and deep RL baselines on existing benchmarks [46, 74].
Dataset Splits	Yes	The program dataset used to train qψ and pθ consists of 35,000 programs for training and 7,500 programs for validation and testing.
Hardware Specification	Yes	For our experiments, we utilized the following workstation: 20-core Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz, with 2X NVIDIA Ge Force RTX 4070 Ti GPU
Software Dependencies	No	The paper mentions algorithms like GRU [15] and PPO [61], and optimizers like Adam, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup	Yes	Maximum program number: 1000 Batch size : 32 Clipping: 0.05 α: 0.99 γ: 0.99 GAE lambda: 0.95 Value function coefficient: 0.5 Entropy coefficient: 0.1 Number of updates per training iteration: 4 Number of environment steps per set of training iterations: 32 Number of parallel actors: 32 Optimizer : Adam Learning rate: {0.1, 0.01, 0.001, 0.0001, 0.00001}