Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Subgoal Search For Complex Reasoning Tasks

Authors: Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically demonstrate the efﬁciency of MCTS-k Sub S and BF-k Sub S. In particular, we show that they vastly outperform their standard ( non-subgoal ) counterparts. As a testing ground, we consider three challenging domains: Sokoban, Rubik s Cube, and INT. All of them require non-trivial reasoning.
Researcher Affiliation	Collaboration	Konrad Czechowski University of Warsaw EMAIL Tomasz Odrzygó zd z University of Warsaw EMAIL Marek Zbysi nski University of Warsaw m.zbysinski@ students.mimuw.edu.pl Michał Zawalski University of Warsaw EMAIL Krzysztof Olejnik University of Warsaw k.olejnik3@ student.uw.edu.pl Yuhuai Wu University of Toronto, Vector Institute EMAIL Łukasz Kuci nski Polish Academy of Sciences EMAIL Piotr Miło s Polish Academy of Sciences, University of Oxford, deepsense.ai EMAIL
Pseudocode	Yes	Algorithm 1 Best-First Subgoal Search (BF-k Sub S) [...] Algorithm 2 Low-level conditional policy [...] Algorithm 3 Subgoal generator
Open Source Code	Yes	We provide the code of our method and experiment settings at https://github.com/ subgoal-search/subgoal-search, and a dedicated website https://sites.google.com/ view/subgoal-search.
Open Datasets	Yes	2The dataset for INT or Sokoban can be easily generated or are publicly available. For the Rubik s Cube, we use random data or simple heuristic (random data are often sufﬁcient for robotic tasks and navigation.) ... INT [55]
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	Yes	The results were obtained on a server with an Intel Xeon E5-2630 v4 CPU and eight NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions software components like 'transformer architecture' and 'convolutional network' but does not specify their version numbers or the versions of any other software dependencies.
Experiment Setup	Yes	Table 1: BF-k Sub S hyperparameters. [...] In Table 1, we provide the values of the hyperparameters used in all experiments.