Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Authors: Jake Tuero, Michael Buro, Levi Lelis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we demonstrate the sample efficiency our method enables in that it requires substantially fewer node expansions to learn effective policies than other search algorithms trained with the Bootstrap algorithm in a variety of problem domains. We also show that policy tree search algorithms using our subgoal-based policy can learn how to solve problems from domains that HIPS-ε cannot solve.
Researcher Affiliation	Academia	1Department of Computing Science, University of Alberta, Edmonton, Canada 2Alberta Machine Intelligence Institute (Amii), Edmonton, Canada. Correspondence to: Jake Tuero <EMAIL>.
Pseudocode	Yes	See Appendix C for its pseudocode.
Open Source Code	Yes	The codebase 2 is compiled using the GNU Compiler Collection version 13.3.0, and uses the Py Torch 2.4 C++ frontend (Paszke et al., 2019). 2https://github.com/tuero/subgoal-guided-policy-search
Open Datasets	Yes	Craft World: A 14 14 room with various raw materials and workbenches (Andreas et al., 2017). We generate problems with the open-source level generator1 of the procedure detailed by Andreas et al. (2017). 1https://github.com/jacobandreas/psketch/tree/master ... Sokoban: ... We use the Boxoban training and test problems (Guez et al., 2018). ... Sokoban uses the Boxban 4 problems. 4https://github.com/deepmind/boxoban-levels/
Dataset Splits	Yes	Every domain has a disjoint set of 10,000 problem instances to train, 1,000 as validation, and 100 in the test set.
Hardware Specification	Yes	All experiments were conducted on an Intel i9-7960X and Nvidia 3090, with 128GB of system memory running Ubuntu 24.04.
Software Dependencies	Yes	The codebase 2 is compiled using the GNU Compiler Collection version 13.3.0, and uses the Py Torch 2.4 C++ frontend (Paszke et al., 2019).
Experiment Setup	Yes	We use the Adam optimizer (Kingma, 2014), with learning rate of 3E-4 and L2-regularization of 1E-4. The policy and heuristic networks for PHS(π), Levin TS(π), PHS(πSG), and Levin TS(πSG) both use 128 Res Net channels, with PHS*(πSG) and Levin TS(πSG) using half the number of blocks (4 versus 8) due to the fact that they both have both a low-level and high-level policy. The VQVAE subgoal generator uses a codebook size of 4, a codebook dimension of size 128, and β = 0.25.