Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Subgoal-Guided Policy Heuristic Search with Learned Subgoals
Authors: Jake Tuero, Michael Buro, Levi Lelis
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate the sample efficiency our method enables in that it requires substantially fewer node expansions to learn effective policies than other search algorithms trained with the Bootstrap algorithm in a variety of problem domains. We also show that policy tree search algorithms using our subgoal-based policy can learn how to solve problems from domains that HIPS-ε cannot solve. |
| Researcher Affiliation | Academia | 1Department of Computing Science, University of Alberta, Edmonton, Canada 2Alberta Machine Intelligence Institute (Amii), Edmonton, Canada. Correspondence to: Jake Tuero <EMAIL>. |
| Pseudocode | Yes | See Appendix C for its pseudocode. |
| Open Source Code | Yes | The codebase 2 is compiled using the GNU Compiler Collection version 13.3.0, and uses the Py Torch 2.4 C++ frontend (Paszke et al., 2019). 2https://github.com/tuero/subgoal-guided-policy-search |
| Open Datasets | Yes | Craft World: A 14 14 room with various raw materials and workbenches (Andreas et al., 2017). We generate problems with the open-source level generator1 of the procedure detailed by Andreas et al. (2017). 1https://github.com/jacobandreas/psketch/tree/master ... Sokoban: ... We use the Boxoban training and test problems (Guez et al., 2018). ... Sokoban uses the Boxban 4 problems. 4https://github.com/deepmind/boxoban-levels/ |
| Dataset Splits | Yes | Every domain has a disjoint set of 10,000 problem instances to train, 1,000 as validation, and 100 in the test set. |
| Hardware Specification | Yes | All experiments were conducted on an Intel i9-7960X and Nvidia 3090, with 128GB of system memory running Ubuntu 24.04. |
| Software Dependencies | Yes | The codebase 2 is compiled using the GNU Compiler Collection version 13.3.0, and uses the Py Torch 2.4 C++ frontend (Paszke et al., 2019). |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma, 2014), with learning rate of 3E-4 and L2-regularization of 1E-4. The policy and heuristic networks for PHS*(π), Levin TS(π), PHS*(πSG), and Levin TS(πSG) both use 128 Res Net channels, with PHS*(πSG) and Levin TS(πSG) using half the number of blocks (4 versus 8) due to the fact that they both have both a low-level and high-level policy. The VQVAE subgoal generator uses a codebook size of 4, a codebook dimension of size 128, and β = 0.25. |