Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hierarchical Programmatic Option Framework
Authors: Yu-An Lin, Chen-Tao Lee, Chih-Han Yang, Guan-Ting Liu, Shao-Hua Sun
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programmatic options. |
| Researcher Affiliation | Academia | Yu-An Lin Chen-Tao Lee Chih-Han Yang Guan-Ting Liu Shao-Hua Sun National Taiwan University EMAIL |
| Pseudocode | Yes | Algorithm 1 Cross Entropy Method |
| Open Source Code | No | No] Justification: We will release the code as soon as possible. |
| Open Datasets | Yes | To evaluate our proposed HIPO framework, we adopt the Karel domain [56], which characterizes an agent that navigates a grid world and interacts with objects. HIPO outperforms prior programmatic reinforcement learning and deep RL baselines on existing benchmarks [46, 74]. |
| Dataset Splits | Yes | The program dataset used to train qĻ and pĪø consists of 35,000 programs for training and 7,500 programs for validation and testing. |
| Hardware Specification | Yes | For our experiments, we utilized the following workstation: 20-core Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz, with 2X NVIDIA Ge Force RTX 4070 Ti GPU |
| Software Dependencies | No | The paper mentions algorithms like GRU [15] and PPO [61], and optimizers like Adam, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | Maximum program number: 1000 Batch size : 32 Clipping: 0.05 α: 0.99 γ: 0.99 GAE lambda: 0.95 Value function coefficient: 0.5 Entropy coefficient: 0.1 Number of updates per training iteration: 4 Number of environment steps per set of training iterations: 32 Number of parallel actors: 32 Optimizer : Adam Learning rate: {0.1, 0.01, 0.001, 0.0001, 0.00001} |